Plant retroviral polynucleotides and methods for use thereof

ABSTRACT

Retroviral and retroviral-like polynucleotides, and vectors, proteins, and antibodies derived therefrom, that are useful for the introduction of genetic information into soybeans and other plant species.

FIELD OF THE INVENTION

[0001] The present invention relates generally to retroviruses,pro-retroviral polynucleotides including pro-retroviral DNA,pro-retroviral-like DNA and more specifically to recombinant vectorsderived therefrom for use in delivering genetic information tosusceptible target plant cells.

BACKGROUND OF INVENTION

[0002] Repetitive DNA sequences are a common feature of the genomes ofhigher eukaryotes. Repetitive DNA family members in animals and higherplants are tandemly repeated or interspersed with other sequences(Walbot and Goldberg, 1979; Flavell, 1980), and may constitute more than50% of the genome (Walbot and Goldberg, 1979). Estimates of theproportion of repetitive DNA in the soybean genome range from 36-60%(Goldberg, 1978; Gurley et al., 1979).

[0003] High copy-number repeats on the order of 10⁵ per haploid genomecomprise only 3% of the soybean genome, whereas moderately repetitivesequences with copy-numbers in the 10³ range occupy 30-40% of the genome(Goldberg, 1978). Electron micrographic examination of these moderatelyrepetitive sequences demonstrate that they average about 2 kb in length.However, 4% of those observed exceed 11 kb (Pellegrini and Goldberg,1979).

[0004] Most of the highly repetitive sequences in higher eukaryoticgenomes are relatively short and are organized in tandem arrays. Forexample, the chromosomal region adjacent to the centromere in highereukaryotes is composed of very long blocks of highly repetitive DNA,called satellite DNA, in which simple sequences are repeated thousandsof times or more. Tandemly repeated elements found in the soybean genomealso include the ribosomal RNA (rRNA)-encoding genes. The approximately800 rDNA copies are organized as one or more clusters of tandemlyrepeated 8-kb or 9-kb units (Friedrich et al., 1979; Varsanyi-Breiner etal., 1979).

[0005] The genomes of most higher eukaryotes also contain highlyrepetitive sequences that are distributed evenly throughout the genome,interspersed with longer stretches of unique (or moderately repetitive)DNA. These interspersed repetitive DNA elements are variable in length,are recognizably related but not precisely conserved in sequence, andexhibit relatively small repeat frequencies (Lapitan, 1992).

[0006] The dispersal pattern of interspersed repetitive elements inhigher eukaryotic genomes has led to the suggestion that they are, oronce were, transposable elements known as transposons (Flavell, 1986;Lapitan, 1992). Transposons are genetic elements that can move from onechromosomal location to another, without necessarily altering thegeneral architecture of the chromosomes involved. The existence oftransposons has only found general acceptance within the last fewdecades. Genes were originally believed to have fixed chromosomallocations that only change as a result of chromosomal rearrangementsresulting from illegitimate crossing-over between incompletelyhomologous short sections of DNA. Then, in the late 1940's, McClintock'spioneering experiments with maize showed that certain genetic elementsregularly “jump”, or transpose, to new locations in the genome(McClintock, 1984).

[0007] Transposable elements (TEs) reside in the genomes of virtuallyall organisms (Berg and Howe, 1989). TEs encode enzymes that bring aboutthe insertion of an identical copy of themselves into a new DNA site.Transposition events involve both recombination and replicationprocesses that frequently generate two daughter copies of the originaltransposable element; one remains at the parental site, while the otherappears at the target site (Shapiro, 1983).

[0008] Two major classes of eukaryotic TEs have been identified, whichare distinguished by their mode of transposition (Finnegan, 1989). ClassI elements transpose via the creation of an RNA intermediate that isthen reverse-transcribed to create a DNA copy that integrates at thetarget site. This class includes several families ofretroelements—retrotransposons and retroviruses, including the copiaelements of Drosophila melanogaster, the gypsy/Ty3 family, the Ty1element of yeast, and the mammalian immunodeficiency and Rous sarcoma(RSV) retroviruses. Each of these retroelement families arecharacterized in part by the presence of long terminal repeats (LTRs) attheir borders (Finnegan, 1989). However, this class also includesnon-LTR-containing elements like Cin4 from maize (Schwarz-Sommer andSaedler, 1988) and the mammalian L1 family (Hutchinson et al. 1989).

[0009] The copia elements in D. melanogaster possess long terminaldirect repeats. There are more than 11 families of copia-like elements;the members of each are well-conserved and are located at 5 to 100different sites in the Drosophila genome. These elements are about 5000base pairs (bp) long, with long terminal repeats (LTRs) several hundredbp in length that vary in both sequence and length between families. Atthe termini of each element are short imperfect inverted repeats ofabout 10 bp.

[0010] Insertion of copia into a new chromosomal site is accompanied byreplication of a 3-6 bp stretch of target DNA; the length, but not thesequence, of the direct repeats that consequently appear immediatelybefore and after the element is the same for all members of the samefamily. Copia elements have one long open reading frame (ORF) thatencodes proteins homologous to those of RNA tumor viruses: homologies toreverse transcriptase, integrase, and nucleic acid-binding proteinssuggest that these proteins function to create an RNA intermediate forcopia transposition.

[0011] Class II elements, like the Drosophila melanogaster P element(Engels, 1989; Rio, 1990) and the maize Ac/Ds element (Federoff, 1989),transpose directly to new sites without the formation of an RNAintermediate. P elements reside at multiple sites in the Drosophilagenome and are 0.5 to 1.4 kb in length, bounded by perfect invertedrepeats of 31 bp. They represent internally deleted versions of a largerelement of about 3 kb called a P factor, which occurs in one or a fewcopies only in so-called “P strains” of Drosophila. Upon insertion intoa new site in the genome, P elements create 8 bp duplications of thetarget sequence.

[0012] The Ac/Ds system in maize consists of Ds elements, which like theP elements of Drosophila, are derived from a larger complete elementcalled Ac. Ds elements exist in several different lengths, from 0.4 to 4kb. Unlike P elements, Ds elements remain stationary within thechromosome unless an Ac element is also present. Ds elements containperfect inverted repeats of 11 bp at their termini, flanked by 6-8 bpdirect repeats of the target DNA. When a Ds (or Ac) element transposes,it leaves behind imperfect but recognizable duplications of the 6-8 bptarget sequence.

[0013] As stated above, it appears likely that many interspersedrepetitive DNA families are, or once were, transposons. In soybean, aninterspersed repetitive DNA family whose structural characteristicsclearly define it as a transposon family is the Tgm family. The Tgmfamily is related to the maize En/Spm transposons and consists of fewerthan 50 members ranging in size from under 2 kb to greater than 12 kb(Rhodes and Vodkin, 1988).

[0014] Retroviruses are type I transposons consisting of an RNA genomethat replicates through a DNA intermediate. Although the viral genome isRNA, the intermediate in replication is a double-stranded DNA copy ofthe viral genome called the provirus (Watson et al., 1987). The provirusresembles a cellular gene and must integrate into host chromosomes inorder to serve as a template for transcription of new viral genomes(Varmus, 1982). New genomes are processed in the nucleus by unmodifiedcellular machinery.

[0015] The viral genome RNA looks like a cellular messenger RNA (mRNA),but does not serve as such following infection of a cell. Instead, anenzyme called reverse transcriptase (which is not present in the cell,but is instead carried by the virion) makes a DNA copy of the viral RNAgenome, which then undergoes integration into cellular chromosomal DNAas a provirus. Integration of the viral DNA is precise with respect tothe viral genome, but is semi-random with respect to the host cellgenome, in that some sites are utilized more frequently than others(Shih et al., 1988). The integrated provirus serves as a template forproduction of new viral RNA genomes, which move to the cell membrane toassemble into virions. These bud from the cell membrane without killingthe cell.

[0016] Retrovirus virions have icosahedral nucleocapsids surrounded by aproteinaceous envelope. The retroviral genome is diploid, and itsgeneral organization is well-known in the art. Typical retroviruses havethree protein-encoding genes: gag (group-specific antigen) encodes aprecursor polypeptide that is cleaved to yield the capsid proteins; polis cleaved to yield reverse transcriptase and an enzyme involved inproviral integration; and env encodes the precursor to the envelopeglycoprotein. A fourth type of retroviral gene, called tat, has beenfound at the 3′ end of the HTLV-I and -II genomes, which serves as atranscriptional enhancer. A few retroviruses have additional genes, suchas onc, that give them the ability to rapidly induce certain types ofcancer.

[0017] Retroviral genomes contain LTR sequences at both their 5′ and 3′ends (Weiss, 1984). These sequences include signals needed forreplication, transcription, and post-transcriptional processing of viralRNA transcripts. The LTRs are perfect direct repeats created by theaddition of sequences (called U₅ and U₃, derived from the opposite endsof the viral genome) to each end of the viral genome during the creationof the double-stranded DNA intermediate. The U₅ region appears to beessential for initiation of reverse transcription and in packaging ofviral transcripts (Murphy and Goff, 1988). The U₃ region contains anumber of cis-acting signals for viral replication, and sequencesresponsible for much or all of the transcriptional control over viralgenes.

[0018] Retroviral genomes also contain a primer binding site (PBS) nearthe 5′ end (Dahlberg, et al., 1974). This sequence is complementary tothe 3′ end of a cellular tRNA. The tRNA is stolen from the host cellduring replication and serves as a primer for reverse transcription ofthe RNA genome soon after infection.

[0019] Once the provirus is integrated into cellular chromosomal DNA, itis stable and replicates along with the host cell DNA. Proviruses arenever excised from the site of integration, although they may be lost asa result of deletions. Retrovirus infections usually do not harm thecell, and infected cells continue to divide, with the integratedprovirus serving as a template to direct viral RNA synthesis.

[0020] Like all viruses, retroviruses have a specific requirement forinteraction with a target cell-surface receptor molecule for infection.In all cases known (and suspected), this molecule is a protein thatinteracts specifically with a specific virion env protein. Thebest-studied of virion envelope protein-cell surface receptorinteraction is that of HIV with the CD4 receptor on human T-cells(Dalgleish et al., 1984). The env protein appears to bind to a smallregion on the receptor not involved in cell-cell recognition or anyother known function. Another retrovirus whose cellular receptor hasbeen identified is Moloney murine leukemia virus (MMLV), which interactswith a cell surface protein that resembles a membrane pore or channelprotein. Although the mechanism of interaction of many retroviruses isnot yet well understood, it does appear that retroviruses interact witha wide variety of receptor types (Weiss, 1982).

[0021] Retroviruses have been studied intensely over the past severaldecades, mainly because of their ability to cause tumors in animals andto transform cells in culture. The ability of retroviruses to transformcells is based on at least two mechanisms. The first is that certainviruses have incorporated activated proto-oncogenes that upon mutationhave acquired the ability to transform cellular growth. The secondmechanism of transformation results from insertional mutagenesis uponintegration of the viral genome. Because the viral LTRs have promoterand enhancer activities, insertion of an LTR sequence in eitherorientation adjacent to a cellular gene may lead to inappropriateexpression of that gene. If the cellular gene is involved in regulationof cell growth, over- or under-expression or insertional mutagenesis ofthat gene may lead to uncontrolled growth of the cell.

[0022] Retroviral integration is thus potentially mutagenic. Integrationof retrotransposons within exonic coding regions may inactivate thosegenes, while integration within introns or flanking regions may createnovel regulatory patterns with significant developmental andevolutionary implications (McDonald, 1990; Robins and Samuelson, 1993;Schwarz-Sommer and Saedler, 1987; Weil and Wessler, 1990; White et al.,1994). Enhancers and trans-activating sequences have been found inretroviral and retrotransposon LTRs (Boeke, 1989; Cavarec, et al, 1994;Choi and Faller, 1994; Lohning and Ciriacy, 1994; Mellentin-Michelottiet al., 1994; Varmus and Brown, 1989), and retrotransposon insertionsbetween coding regions and enhancers disrupt gene expression (Cal andLevine, 1995; Georgiev and Corces, 1995; Geyer and Corces, 1992; Whiteet al., 1994).

[0023] Element mobilization not only modifies target gene activity, itrestructures genomic architecture (King, 1992, Lim and Simmons, 1994;McDonald, 1993; Shapiro, 1992). In fact, one of the major genomicdifferences between related taxonomic groups appears to be the identityand distribution of repetitive elements, not single-copy codingsequences (McDonald, 1993; Shapiro, 1992). White et al. (1994) havedemonstrated that the flanking regions of many maize genes are embeddedin sequences containing traces of retrotransposon DNA. Moreover,Palmgren (1994) has found that the BstI retroelement from maize encodestwo conserved domains found in plant membrane H⁺-ATPases, suggestingthat element acquisition of host sequences is not confined to vertebrateretroviruses.

[0024] McClintock (1984) has proposed that genetic variation, induced inpart by transposable element-mediated insertional mutagenesis, is adirected response to conditions that create “genomic stress.” Many TEsand retroviruses preferentially insert in transcriptionally activeregions of the genome (Engels, 1989; Sandmeyer et al., 1990; Varmus andBrown, 1989). The Ty1 retrotransposon in yeast can be activated bygrowth in sub-optimal temperatures (Paquin and Williamson, 1988) and byexposure to radiation (McEntee and Bradshaw, 1988). Similar observationshave been made in Drosophila (McDonald et al., 1988; Strand andMcDonald, 1985), maize (McClintock, 1984), and soybean (Sheridan andPalmer, 1977).

[0025] In plants, TEs are activated during the induction of tissueculture (Hirochika, 1993; Peschke and Phillips, 1991) and may contributeto somaclonal variation observed for a number of higher plant speciesincluding soybean (Amberger et al., 1992; Freytag et al., 1989;Graybosch et al., 1987; Roth et al., 1989). In maize, the activation oftransposable elements is correlated with changes in the pattern of DNAmethylation that occur during induction of cultures (Brettell andDennis, 1991; Kaeppler and Phillips, 1993; Peschke et al., 1991),providing a well-characterized basis for gene activation.

[0026] In plants, most transposon-like sequences appear to be extinct(Grandbastien, 1992). Although a number of plant species harbor thesesequences (Flavell et al., 1992; Grandbastien, 1992; Voytas et al.,1992), active transposition has only been demonstrated or directlyimplicated in tobacco (Grandbastien, et al., 1989; Pouteau et al., 1994)and maize (Johns et al., 1985). RNA transcripts and cDNAs fromtransposons have been recovered from tobacco (Pouteau, et al., 1994;Hirochika, 1993) and maize (Hu et al., 1995), and transposableelement-related proteins have been detected in maize (Hu et al., 1995).

[0027] The stable introduction of foreign genes into plants representsone of the most significant developments in a continuum of advances inagricultural technology that includes modern plant breeding, hybrid seedproduction, farm mechanization, and the use of agrichemicals to providenutrients and control pests. Genetic engineering has been applied tomany species in efforts to improve production efficiency andenvironmental conservation. Genetic engineering complements plantbreeding efforts by increasing the diversity of genes and germplasmavailable for incorporation into crops and shortening the time requiredfor the production of new varieties and hybrids, while also providingopportunities to develop new agricultural products and manufacturingprocesses.

[0028] The first transgenic plants were tobacco plants transformed witha chimeric neomycin phosphotransferase gene carried on the Ti plasmid ofAgrobacterium tumefaciens (Horsch et al., 1984). Agrobacterium-mediatedTi plasmid transfer has proved to be an efficient, versatile method ofplant transformation. The range of plant species amenable to geneticengineering using Agrobacterium is fairly large. In those systems whereAgrobacterium-mediated transformation is efficient, it is the method ofchoice because of the facile and defined nature of the gene transfer.

[0029] Few monocotyledonous plants appear to be natural hosts forAgrobacterium, however, although transgenic plants have been produced inasparagus and transformed tumors have been observed in yam. Manycommercially valuable crop species, such as cereal grains (e.g., rice,maize, and wheat) are not efficiently transformed by Agrobacterium,despite extensive efforts made in this direction. This appears to be dueto differences in the wound response; those species recalcitrant toAgrobacterium-mediated transformation probably do not express therequired appropriate wound response (Potrykus, 1991).

[0030] Physical methods of gene delivery have been developed in order totransform plants not susceptible to Agrobacterium. These methods includebiolistic projection (“particle gun”), microinjection, electroporation,and lipofection (Potrykus, 1991). Most physical transformationexperiments have utilized plant protoplasts as the recipient cells.However, other regenerable explants have been utilized, includingleaves, stems, and roots. Many plant species have been successfullytransformed with physical techniques, but some, notably legumes andcereals, have proved difficult to stably transform by these methods. Theapplicability of such physical methods to these plants is limited by thedifficulties involved in regenerating plants from protoplasts, althoughsome success in this regard has been achieved with some cereals andrice. Little success has been achieved with soybean or maize.

[0031] Little experimentation has been reported regarding the use ofviral vectors for transformation of plants. Plant viruses exist in avariety of forms; they contain either DNA or RNA as their geneticmaterial, have either rod- or polyhedral-shaped capsids, and can betransmitted either by insects, bacteria, or contact with wounded regions(Robertson, et al., 1983). Most known plant viruses contain single (+)strand RNA as their genetic material. (+) strand plant viruses canfurther be divided into those which possess a single RNA chain and thosewhich have several RNA chains, each necessary for viral infectivity andwhich are separately encapsulated into separate virions. Cowpea mosaicvirus, for example, contains two RNAs, one encoding several proteinsincluding terminal protein and a protease, with the other chain encodingcapsid proteins. There also exist segmented double-strand RNA plantviruses. The best-known of these is wound tumor virus (WTV) whichcontains 12 different segments and which can replicate in either insector plant cells.

[0032] There are fewer plant DNA viruses. Only two known classes exist,one of which contains double strand DNA and which has a polyhedralcapsid. The best understood of this class is cauliflower mosaic virus(CMV). The second class of DNA plant viruses are the geminiviruses thatconsist of paired capsids held together like twins with each capsidcontaining a circular single-stranded DNA of about 2500 nucleotides. Insome cases, the two paired genomes are identical, while in other cases,the two bear almost no sequence relationship.

[0033] Early work with a DNA virus showed that a small bacterialantibiotic resistance gene integrated into such a virus could spreadsystemically throughout infected plants and confer resistance (Brisson,et al., 1984). It has been suggested that the small size of DNA viralgenomes is prohibitory to the wide application of such vectors as usefultransforming agents in plants. However, little has been done to followup on this work.

[0034] Even less work has been performed in plants regarding theapplication of genetic engineering to the far larger group of plant RNAviruses (Ahlquist et al., 1987; Ahlquist and Pacha, 1990). It has beensuggested that because the viral RNA does not integrate into the hostgenome, and is excluded from the meristems and offspring, the usefulnessof such RNA viruses in plant transformation is limited at best(Potrykus, 1991).

SUMMARY OF THE INVENTION

[0035] In one aspect, the present invention provides retroviral andretroviral-like polynucleotides derived from a plant wherein suchpolynucleotides are capable of integration into the genome of a plantcell. The invention is also directed to other plant retroviral orretroviral-like polynucleotides obtainable by hybridization understringent conditions (see, e.g., Sambrook et al.) with the retroviral orretroviral-like polynucleotides expressly disclosed herein. Also withinthe scope of this aspect of the invention are regulatory sequencescomprising, for example, plant retroviral long terminal repeat (LTR)sequences that may be operably linked to a gene so as to modulateexpression of the linked gene.

[0036] In a second aspect, the invention is directed to plant retroviralor retroviral-type elements capable of targeted integration into aspecific region in the plant genome and further to methods foraccomplishing such integration.

[0037] In a third aspect, the present invention is directed to vectorscontaining all or part of a regulatory sequence derived from a plantretrovirus or retrovirus-like polynucleotide, and to vectors comprisingall or part of the retroviral or retroviral-like genome and aheterologous gene.

[0038] In a fourth aspect, the invention is directed to vectorscontaining one or more plant retroviral or retroviral-like regulatorysequences operably linked to a heterologous gene. A heterologous gene inthe context of the present application refers to a gene or gene fusionor a part of a gene derived from a source other than the plantpro-retrovirus, or a cDNA, or a plant retroviral gene under theregulatory control of a promoter other than its natural promoter.

[0039] In a fifth aspect, the invention is directed to isolated purifiedproteins encoded by the polynucleotides disclosed herein, and toanalogs, homologs, and fragments of such proteins that retain at leastone biological property of the proteins.

[0040] In a sixth aspect, the invention is directed to isolated purifiedproteins produced by expression of a heterologous gene using the vectorsof the present invention.

[0041] In a seventh aspect, the invention is directed to methods forusing vectors comprising all or part of a plant proretroviral orretroviral genome and vectors comprising plant retroviral regulatorysequences operably linked to a heterologous gene to introduce aheterologous gene or a regulatory element into a plant genome, whereinthe expression product of the gene comprises a polypeptide or anantisense RNA and wherein the regulatory element is a transcriptionalregulatory element.

[0042] In an eighth aspect, the invention is directed to a plantretrovirus comprising a plant retroviral or retroviral-likepolynucleotide, a capsid, and an envelope.

[0043] In a ninth aspect, the invention is directed to methods forproducing a plant retrovirus, in which the plant retroviralpolynucleotide is packaged in a capsid and envelope, preferably throughthe use of a packaging cell line, but alternatively by use of othervector systems or by in vitro constitution of the retroviral capsid andenvelope.

[0044] In a tenth aspect, the invention is directed to plant cells thathave been transformed by transduction of a plant retroviralpolynucleotide or transformed by a plant retrovirus comprising aheterologous gene according to the methods of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0045]FIG. 1 shows the DNA sequence of the oligonucleotide used as aprimer in the polymerase chain reaction that generated the plantpro-retrovirus SIRE1-1 cDNA Gm776 (SEQ ID NO: 1). The 5′ and 3′ ends ofthe oligonucleotide are indicated, and degenerate sites (wherein theoligonucleotide mix contained equal proportions of two nucleotides at agiven site) are indicated in parentheses.

[0046]FIG. 2 presents the nucleotide sequence of the SIRE1-1 cDNA Gm776(SEQ ID NO: 2). The regions corresponding to the oligonucleotide primerused to amplify the cDNA are underlined.

[0047]FIG. 3 depicts a restriction map of the SIRE1-1 Gm776 cDNAsequence.

[0048]FIG. 4 shows a statistical analysis of sequence similaritiesbetween Gm776 and retrotransposons from A. thaliana and Saccharomycescerevisiae.

[0049]FIGS. 5A and 5B set forth the DNA sequences of oligonucleotides(SEQ ID NOS: 12-24) utilized in sequencing Gm776 and the 2.4 kb SIRE1-1cDNA.

[0050]FIG. 6 sets out the nucleotide sequence (SEQ ID NO: 3) of the 2.4kb SIRE1-1 cDNA isolated from a lambda gt11 soybean cDNA library.

[0051]FIG. 7 depicts a restriction map of the 2.4 kb SIRE1-1 cDNA.

[0052]FIG. 8 depicts the organization of the 2.4 kb SIRE1-1 cDNA.

[0053]FIG. 9 shows a comparison of the predicted SIRE1-1 CX₂CX₄HX₄C (SEQID NO: 60) nucleic acid-binding site sequences (SEQ ID NO: 4 and SEQ IDNO: 61) with the amino acid sequences of those in other nucleocapsidproteins (SEQ ID NOS: 62-68).

[0054]FIG. 10 shows a comparison of the predicted amino acid sequence(SEQ ID NO: 5) of the putative SIRE1-1 protease domain with the aminoacid sequences of other retroelement proteases ( SEQ ID NOS: 69-75).

[0055]FIG. 11 shows an alignment of the RNA sequence (SEQ ID NO: 6) ofthe putative SIRE1-1 primer binding site to the 3′-end of soybeantRNA^(met−1) (SEQ ID NO: 76). Identity between the sequences isindicated by a vertical line (|).

[0056]FIG. 12 shows a sequence alignment between the 3′-termini of theputative 5′ LTR of SIRE1-1 (SEQ ID NO: 7) and the 5′ LTR of the potatoretrotransposon Tst1 (SEQ ID NO: 77). Identity between the sequences isindicated by a vertical line (|).

[0057]FIG. 13 sets out the DNA sequence (SEQ ID NO: 8) of the 4.2 kbfragment of the SIRE1-1 genomic clone isolated from a lambdabacteriophage FIX II soybean genomic library.

[0058]FIG. 14 depicts the organization of the 4.2 kb SIRE1-1 genomicfragment.

[0059]FIG. 15 shows the predicted amino acid sequence encoded by theSIRE1-1 open reading frames ORF1 (single underline) (SEQ ID NO: 9) andORF2 (SEQ ID NO: 59) (double underline) encoded by the 4.2 kb SIRE1-1genomic fragment. The sequences formed by stop codons are also shown(SEQ ID NO: 85 and SEQ ID NO: 86).

[0060]FIG. 16 shows the predicted amino acid sequence (SEQ ID NO: 84)encoded by the SIRE1-1 open reading frame ORF2. The putative signalpeptide sequence (residues 22-43) and hydrophobic anchor sequence(residues 511-531) are underlined.

[0061]FIG. 17 shows a comparison of the predicted amino acid sequence(SEQ ID NO: 11) of the SIRE1-1 ORF1 with the C-terminal region of thecopia RNase H polypeptide (SEQ ID NO: 78). Vertical lines (|) indicateidentity between the sequences, whereas conservative andsemi-conservative substitutions are indicated by (:) or (.)respectively.

[0062]FIG. 18 shows a restriction map of the SIRE1-1 genomic cloneisolated from a λ bacteriophage FIX II soybean genomic library. The 5′and 3′ ends of the insert are at the left and right, respectively. Thenumbers above and below the schematic indicate the approximate lengthsof the restriction fragments. The restriction endonuclease recognitionsites are indicated by single letter codes: H represents a Hind IIIsite; X represents an Xba I site; and N represents a Not I site. Theboxed regions of the schematic represent open reading frames encodingSIRE1-1 proteins: int represents the integrase domain; RT represents thereverse transcriptase domain; RH represents the Ribonuclease H domain;and env represents the envelope protein domain. The rightmost (open) boxrepresents the 3′ soybean flanking region.

[0063]FIG. 19 shows the DNA sequences (SEQ ID NOS: 25-38) ofoligonucleotide primers used to sequence the 4.2 kb genomic fragment.The numbering in the second column indicates the position of the primersequence with reference to the predicted sense strand of the genomicfragment. Also shown are M13/pUC forward (SEQ ID NO: 12) and reverseoligonucleotide sequences (SEQ ID NO: 14).

[0064]FIG. 20 shows the results of a computer analysis performed on thepredicted ORF2 amino acid sequence (SEQ ID NO: 55) using the computerprogram NNpredict (Kneller et al. 1990).

[0065]FIG. 21 shows a nucleotide sequence comparison among the SIRE1-13′ LTR (LTR2) (SEQ ID NO: 58) and the gag R1 (SEQ ID NO: 57) and R2 (SEQID NO: 56) regions. The numbers following the sequence designationsindicate the respective locations of the regions within the SIRE1-1 4.2kb genomic fragment.

[0066]FIG. 22 depicts a nucleotide sequence comparison between Gm776(SEQ ID NO: 2) and the 2.4 kb SIRE1-1 cDNA (SEQ ID NO: 3). The Gm776 DNAsequence is in reverse orientation (i.e., in the 3′ to 5′ orientation)to the 2.4 kb cDNA sequence.

[0067]FIG. 23 shows the predicted amino acid sequence (SEQ ID NO: 83) ofORF2. The putative hydrophobic transmembrane regions are indicated by asingle underline. The predicted coiled-coil regions are indicated by adouble underline. The proline rich region is indicated by a dottedunderscore. The predicted α-helical regions are indicated in boldfacetype. The potential SU/TM cleavage sites are indicated by boxes.

[0068]FIG. 24 depicts an agarose gel electrophoretic analysis ofrestriction endonuclease digestion of the SIRE1-1 λFIXII genomic DNA byHind III. Lane 1 contains λ DNA size markers. Lane 2 contains theSIRE1-1 λFIXII genomic DNA digested by Hind III. The relative lengths ofthe Hind III fragments are indicated by the numbers (e.g., 2.1 H is a2.1 kb Hind III fragment).

[0069]FIG. 25 shows a schematic representation of the results ofrestriction endonuclease digestion and Southern hybridization analysesof the SIRE1-1 genomic clone; The length and nature of each fragment isindicated by the alphanumerical designation at the left (e.g., 1.5H is a1.5 kb Hind III fragment). The fragment(s) recognized by each probe(i.e., env, gag, LTR) are indicated by the arrows.

[0070]FIG. 26 presents the result of a restriction endonucleasedigestion and Southern hybridization analysis of the SIRE1-1 genomicclone. The SIRE1-1 genomic clone was digested with Sac I and Hind III.The length of the hybridizable fragments is indicated to the left. TheSouthern hybridization was performed with a radioactively labeled envprobe derived from the 4.2 kb Xba I fragment.

[0071]FIG. 27 presents a schematic of the pEG4.1 vector construct. The4.1 kb SIRE1-1 insert is indicated by the thick bolded clockwise arrow.

[0072]FIG. 28 depicts the result of restriction endonuclease digestionand Southern hybridization analysis of the pEG4.3 vector constructcomprising the 4.3 kb SIRE1-1 Hind III fragment. The Southernhybridization was performed using a radioactively labeled gag probederived from the 4.2 kb SIRE1-1 Xba I fragment.

[0073]FIG. 29 presents a schematic of the pEG4.3 vector construct. The4.3 kb SIRE1-1 insert is indicated by the thick bolded clockwise arrow.

[0074]FIG. 30 presents the sequences (SEQ ID NOS: 39-49) ofoligonucleotide primers utilized in the sequencing of the 4.1 kb and 4.3kb SIRE1-1 Hind III fragments contained in pEG4.1 and pEG4.3,respectively. The lower-case c following a primer designation indicatesthat the primer was utilized for sequencing the (−) strand of theinsert. Also shown are PUC forward (SEQ ID NO: 12) and reverse (SEQ IDNO: 14) oligonucleotide sequences.

[0075] FIGS. 31(a)-(c) presents the nucleotide sequence (SEQ ID NO: 50)of the SIRE1-1 genomic clone derived from the sequences of the 4.1 and4.3 kb SIRE1-1 Hind III fragments. The first 321 nucleotides of thesequence are derived from the 3′ terminus of the 4.3 kb Hind IIIfragment, and the remaining sequence is derived from the 4.1 kb Hind IIIfragment. The Hind III restriction endonuclease recognition site isindicated in boldface (nt 322-327).

[0076]FIG. 32 presents the amino acid sequence (SEQ ID NO: 51) of thepredicted open reading frame encoded by the combined nucleotidesequences of the 4.3 kb and 4.1 kb Hind III fragments of the SIRE1-1genomic clone.

[0077]FIG. 33 presents a comparison of the predicted amino acid sequence(SEQ ID NO: 52) of the SIRE1-1 int domain with the integrase domain ofthe Opie-2 retroelement (SEQ ID NO: 79) from maize. The amino acidresidues constituting the HHCC and D(10)D(35)E conserved motifs arepresented in boldface. A (.) represents a gap in the sequence requiredfor optimal alignment. A (|) represents identity between the residues. A(:) represents similarity between the residues.

[0078]FIG. 34 presents a comparison of the predicted amino acid sequence(SEQ ID NO: 53) of the SIRE1-1 reverse transcriptase (RT) domain and thereverse transcriptase domain of the Opie-2 retroelement from maize (SEQID NO: 80). The regions corresponding to conserved retroelement RTdomains are presented in boldface. A (|) represents identity between theresidues. A (:) represents similarity between the residues.

[0079]FIG. 35 presents a comparison of the predicted amino acid sequence(SEQ ID NO: 54) of the SIRE1-1 Ribonuclease H (RH) domain and theRibonuclease H domain of the Opie-2 retroelement from maize (SEQ ID NO:81). The conserved DEDD motif is indicated by boldface. A (|) indicatesidentity between the residues. A (:) indicates similarity between theresidues. A (.) indicates a gap in the sequence required for optimalalignment.

[0080]FIG. 36 presents an alignment of the SIRE1 gene sequences SIRE1-1,SIRE1-7, SIRE1-8 and SIRE1-9. Based on the SIRE1-1 sequence the codingregions are set out as follows: LTR sequences span from approximatelynucleotides 1-1154 and from nucleotide 8851 to the end; the gag-polregion spans approximately nucleotides 1213-5958; the env region spansfrom approximately nucleotides 5959-8038. Nonsense mutations in SIRE1-1near the start of each ORF are highlighted in bold.

[0081]FIG. 37 highlights possible transcriptional elements in theSIRE1-7 LTR. The dof-like binding sites are in bold, and the MYB-likebinding sites are in bold italics. The direct repeats are underlinedwith distinct patterns to differentiate them by sequence. The tandemrepeats of 7 bp and 20 bp, respectively, are underlined with ______ and______. The putative TATA box is shaded in black, the putative polyAsignal is shaded in gray, and the putative RNA start site is indicatedby

.

[0082]FIG. 38 presents a modified CLUSTALW alignment of the intervalbetween ORF2 and the 3′ LTR. The ORF2 stop codon and the 5′ end of theLTR are shaded in black. The PPT and PPT-like tracts are shaded in gray.Short direct repeats that flank some indels are underlined. Theimperfect long tandem repeat is boxed, with the first member boxed insolid lines and the second member boxed in dashed lines.

DETAILED DESCRIPTION OF THE INVENTION

[0083] The present invention provides novel plant retroviruses,proretroviruses, proretroviral polynucleotides, proretroviral DNAs,proretroviral-like polynucleotides and plant retroviral derivatives thatare useful for genetic engineering in plants. More particularly, theplant retroviruses, proretroviruses, proretroviral polynucleotides,proretroviral DNAs, proretroviral-like polynucleotides, and plantretroviral derivatives derived therefrom are useful for: introducing aheterologous DNA of interest into plant cells where the peptide orpolynucleotide encoded by that sequence will be expressed; forintroducing a DNA sequence of interest into plant cells where the RNAencoded by that sequence is complementary (antisense) to an endogenousplant polynucleotide; for introducing a DNA sequence into a plant cellwhere that sequence becomes integrated into a plant genome; forintegrating gene regulatory elements such as transcriptional regulatorysequences into a plant genome; and for identifying the location of suchintegrations.

[0084] The invention provides vector constructs comprising plantproretroviral polynucleotides, proretroviral DNAs, proretroviral-likepolynucleotides, fragments thereof, and retroviral derivatives derivedtherefrom that are useful for: expressing desired proteins in targetplant cells, for example, proteins that confer enhanced growth, diseaseresistance, or herbicide tolerance to plant cells, or to express“antisense” RNA complementary to an endogenous plant polynucleotide.

[0085] The invention also provides methods for: producing a plantretroviral vector; using a plant retroviral polynucleotide to identifygenetic loci and to characterize the function of a gene within a plantgenome; introducing mutations into a plant genome or disrupting anendogenous plant gene (“knockout”); and inserting genes or generegulatory elements into genomic loci of plants.

[0086] The following examples are illustrative of certain embodiments ofthe present invention but are not to be construed as limiting thereof.

[0087] Example 1 describes the isolation and characterization of theSIRE1-1 cDNA.

[0088] Example 2 describes the isolation and characterization of afull-length SIRE1-1 clone from a soybean genomic library.

[0089] Example 3 describes the analysis of transcriptional activity fromthe SIRE1-1 pro-retrovirus in soybean and other plants.

[0090] Example 4 describes the detection of SIRE1-1 retrovirally encodedprotein expression in plant tissues by Western blot analysis.

[0091] Example 5 describes the in vitro production of polypeptides fromSIRE1-1-encoded mRNAs.

[0092] Example 6 describes the use of SIRE1-1 in non-replicativetransduction of plant cells.

[0093] Example 7 describes methods and products for production of plantretrovirus packaging cells.

[0094] Example 8 describes methods for transduction of plant retroviralpolynucleotides into plant cells.

[0095] Example 9 describes the use of SIRE1 as a gene transfer vector.

[0096] Example 10 describes the use of SIRE1 to induce and tag mutationsin plant genomes.

[0097] Example 11 describes the modification of SIRE1 to effect directedintegration at a specific locus in a plant genome.

[0098] Example 12 describes the use of SIRE1 and flanking DNA sequencesto determine the site of SIRE1 insertion in the soybean genome.

[0099] Example 13 describes sequences of SIRE1-7, SIRE1-8 and SIRE1-9

[0100] Example 14 describes sequence alignment of SIRE1 genes SIRE1-1,SIRE1-7, SIRE1-8, and SIRE1-9

EXAMPLE 1

[0101] Isolation and Characterization of SIRE1-1 cDNA

[0102] The initial characterization of the SIRE1-1 retroviral DNA wasbased on the fortuitous recovery and analysis of a 776-bp DNA fragment(Gm776) generated by the polymerase chain reaction (PCR) in an attemptto amplify soybean DNA coding for a cytokinin biosynthetic enzyme (Latenand Morris, 1993). Amplification of either total DNA (from etiolatedplumules of Glycine max cv Williams, isolated by the method of Doyle andDoyle, 1990) or nuclear DNA (from G. max cv Wayne, isolated by themethod of Hagen and Guilfoyle, 1985) with the single 22-ntoligonucleotide primer (FIG. 1; SEQ ID NO: 1) generated high levels ofGm776. The amount of Gm776 generated in each PCR amplification suggestedthat SIRE1-1 is a member of a multi-copy DNA family, and the absence ofadditional bands suggested that the family is relatively conserved.

[0103] Hybridization and restriction digest analyses were performed tocharacterize the element size of the SIRE1 family. Soybean genomic DNAwas cleaved with BamHI, EcoRI, HaeIII, HindIII, HpaI, and MboI,respectively, electrophoresed through 0.7% agarose, and blotted to anylon membrane. The blot was hybridized with radiolabeled Gm776 cDNA in0.05 M Tris, 1 M NaCl pH 7.5 in 50% formamide at 42° C., washed, andexposed to autoradiography (Southern, 1975). These analyses indicatedthat the SIRE1 family is composed of several hundred, non-tandem, highlyhomogeneous copies, each in excess of 10.6 kb in length.

[0104] XbaI linkers were ligated to agarose gel electrophoresis(AGE)-purified Gm776 (modified Gm776) (Sambrook et al., 1989; Titus,1991). The modified Gm776 DNA was extracted with phenol/chloroform andchloroform, ethanol-precipitated, and redissolved in 10 mM Tris-HCl, 1mM EDTA, pH 7.6. pUC19 was linearized with XbaI and dephosphorylated(Sambrook et al., 1989). Linearized pUC 19 DNA and the modified Gm776DNA insert with the ligated XbaI linkers were ligated, and DH5-α cellswere transformed with the ligation products. Transformants wereidentified by resistance to the antibiotic ampicillin (amp^(r)), and thepresence of plasmids containing the insert in the amp^(r)lac⁻ colonieswas determined by hybridization with ³²P-labeled probe synthesized fromPCR-amplified, PAGE-purified Gm776 DNA. Plasmid DNA from colonies givingpositive hybridization signals was isolated by alkaline lysis (Sambrooket al., 1989).

[0105] The recovered pGm776 plasmid DNA was sequenced bydideoxynucleotide chain termination using Sequenase 2.0 (U.S.Biochemical, Cleveland, Ohio) and plasmid-specific and insert-specificprimers according to the manufacturer's instructions (FIG. 2, SEQ ID NO:2; FIGS. 5A and B, SEQ ID NOS: 12-24). Sequence analysis suggested thatSIRE1-1 is a member of the copia/Tyl retrotransposon family. SIRE1-1sequences were subsequently detected by hybridization studies using theGm776 cDNA probe in the genome of G. max cv Williams, in severaldifferent cultivars, and in the ancestral species, Glycine soja. Thecopy number of the element among these sources varies from a few hundredto over a thousand. The variation in copy number, especially amongdomestic cultivars, suggested that the family remains active, e.g.,capable of replication and transposition. The homogeneity of the sizesof the SIRE1 family members also suggested that most are relativelyyoung and have not had time to accumulate a large number of mutations.

[0106] The nucleotide and all six possible peptide translations of theGm776 sequence were compared to sequences in the GenBank and EMBLdatabases (Devereux et al. 1984). No closely related sequences wererevealed in these searches. However, statistical analyses of sequencesimilarities between Gm776 and retrotransposons from A. thaliana andSaccharomyces cerevisiae were performed using the Gap computer program(Devereux et al. 1984), and revealed lengthy, albeit weak, sequencesimilarities. The results of the analyses are set forth in FIG. 4.Column (a) in FIG. 4 denotes the nucleotide ranges within Gm776 thatexhibit sequence similarities to other retrotransposon elements, andcolumn (b) denotes the retrotransposon elements that exhibit nucleotidesequence homology to the sequences in column (a). Column (c) shows thepercentage identity between the sequence ranges in columns (a) and (b),with gap weights of 3.0 for Ta1 and 2.0 for Ty1 and a gap length weightof 0.3. Two overlapping 300-plus bp regions between nt 150 and 670 ofGm776 exhibit over 50% identity to adjacent regions overlapping the Ta1RNA binding domain. The alignments include seven gaps in each sequence,averaging 2.5 bp per gap.

[0107] When the six potential Gm776 translation sequences were comparedto the sequence of the Ta1 polyprotein in the region of DNA similarity,no similarities were observed. However, 51% of the nucleotides betweenbp 390 and 630 of Gm776 are identical to a sequence within the reversetranscriptase gene of the Saccharomyces cerevisiae retrotransposon Ty1.The alignment requires five gaps averaging 2 bp per gap. There is nosignificant similarity between any of the six potential Gm776translation sequences and the corresponding region of the S. cerevisiaereverse transcriptase. Sequence comparisons with several other planttransposons, including the copia-like elements Tnt1 from tobacco(Grandbastien et al. 1989), Tst1 from potato (Camirand et al. 1990), andPDR1 from pea did not reveal significant similarities.

[0108] Column (d) in FIG. 4 denotes the “qualities” of sequence matchesdenoted in column (c), and column (e) denotes the qualities and standarddeviations of randomized sequence alignments of the same lengths andbase compositions. Column (h) represents the probabilities (P) fornormal distribution calculated using the equation P=0.3989e^(−(x2/2))where x=(Q−meanQ)/S.D. The results indicate that the derivedsimilarities are quite significant, especially as approximately 150,000nucleotides in 30 transposons were analyzed.

[0109] A soybean cDNA lambda gt11 bacteriophage library (Clontech) wasscreened for the presence of SIRE1 cDNAs by hybridization methodswell-known in the art (Sambrook et al. 1989). The radiolabeled probe wasgenerated from the pGm776 plasmid using the Multiprime DNA Labeling kit(Amersham, Arlington Heights, Ill.). Three phage plaques (out of 6,000screened) showed positive hybridization signals and were isolated bylimiting dilution and rescreening. Recombinant phage DNA from one of theclones was isolated from plate lysates (Sambrook et al., 1989) andpurified on a Qiagen-I00 column as recommended by the manufacturer(Qiagen, Chatsworth, Calif.). The clone contained a 4.0 kilobasepair(kb) insert that was transferred from the phage vector to pUC18 asfollows. The purified phage DNA was digested with EcoRI, extracted withphenol/chloroform and chloroform, ethanol precipitated, and redissolvedin 10 mM Tris-HCl, 1 mM EDTA, pH 7.6. pUC18 was linearized with EcoRIand dephosphorylated (Sambrook et al., 1989). Linearized pUC18 DNA andthe 4.0 kb EcoRI DNA insert were ligated, and DH5-α cells weretransformed with the ligation product. Transformants were identified byresistance to the antibiotic ampicillin (amp^(r)), and the presence ofplasmids containing the insert in the amp^(r)lac⁻ colonies wasdetermined by hybridization with ³²P-labeled probe synthesized fromPCR-amplified, gel-purified Gm776 DNA.

[0110] Plasmid DNA from colonies giving positive hybridization signalswas purified over a Qiagen-100 column as described above. Initially,digestion of plasmid DNAs with EcoRI generated insert fragments of 2.4and 1.6 kb. Only the former hybridized to the Gm776 probe. However, therecombinant plasmid isolated for sequencing contained only the 2.4 kbSIRE1-1 fragment, and re-isolation of the original construct proveddifficult. The 2.4 kb cDNA insert was sequenced by dideoxynucleotidechain termination using Sequenase 2.0 (U.S. Biochemical, Cleveland,Ohio) and plasmid-specific and insert-specific primers according to themanufacturer's instructions, and was found to be 2389 bp in length (FIG.6; SEQ ID NO: 3; GenBank Accession No. U22103).

[0111] The cDNA was found to contain an uninterrupted 617-codon openreading frame (ORF) beginning at nucleotide (nt) 236 (FIGS. 6 and 8; SEQID NOS: 8,9). A second 87-codon ORF begins at nt 2155 and continuesthrough the end of the truncated fragment (FIGS. 6 and 8). The ATG codonat nt 236 is the fourth ATG in the sequence. Extended leader regionswith ATGs upstream of the actual translational start site are notunknown among retroelement mRNAs (Varmus and Brown, 1989). In theSIRE1-1 cDNA (SEQ ID NO: 8), the first ATG at nt 28 is followedimmediately by a stop codon, and initiations at the two other upstreamATGs each may produce only a dipeptide. It has been suggested that 40Sribosomal subunits can reinitiate and resume scanning beyond very short,upstream ORFs (Kozak, 1991). The ATG at nt 236 is closely followed byanother in-frame ATG at nt 242. The latter is actually in a morerepresentative context for translational initiation than is the former(Heidecker et al., 1986).

[0112] The ORF1 of SIRE1-1 (FIGS. 6, 8, and 9; SEQ ID NO: 9) containsthree regions that are characteristically highly conserved amongretroviral and retrotransposon polyproteins (Katz and Jentoft, 1989;Varmus and Brown, 1989). The first two are CX₂CX₄HX₄C (SEQ ID NO: 60)(where C represents cysteine, H represents histidine, and X denotes anyamino acid) nucleic acid-binding motifs (i.e., CCHC boxes) found inretroviral and retrotransposon nucleocapsid (NC) proteins encoded bygag, and the third is a catalytic domain (LDSG: lysine-asparticacid-serine-glycine) characteristic of prot-encoded aspartic proteasesthat cleave retroelement polyproteins.

[0113] In a few characterized retroelements, the CCHC boxes in the gagregion are repeated. The repetition of the CCHC boxes in SIRE1-1 isunique in that the boxes are separated by 189 codons, rather than byjust a few codons as in other retroelements (FIG. 8). As NC proteins aregenerally less than 100 amino acids in length, it is possible that theSIRE1-1 boxes are expressed in two distinct proteins.

[0114] Both SIRE1-1 CCHC boxes are flanked by highly basic regions,especially the region between the boxes: seven of nine amino acids thatprecede the downstream box are lysine or arginine. This ischaracteristic of retroelement NC proteins, which are highly basic andare dominated by polar amino acids. Although the boundaries of theSIRE1-1 NC proteins are not yet defined, CCHC boxes are generally foundnear the carboxy-terminus. The putative NC protein encompasses roughlyamino acids 260 to 525. This region is highly basic (23%) and very polar(62%). Sequence comparisons between the SIRE1-1 protease peptidesequence and those of other retroelements firmly places SIRE1 in thecopia/Ty1 family (FIGS. 9 and 10).

[0115] Retroelement (−) strand replication is usually primed by a hosttRNA, often the initiator tRNA. A 22-nt primer binding site (PBS)complementary to the 3′ end of soybean tRNA^(met−1) (SEQ ID NO: 76) liesupstream of the SIRE1-1 ORFs, between nucleotides 180 and 201 (SEQ IDNO: 6). See FIG. 11. Retroelement PBSs are generally located adjacent tothe 5′-LTR (Boeke, 1989). Two bases separate the 5′ end of the SIRE1-1PBS from the dinucleotide CA, found at the 3′ end of nearly every LTR.The sequence of the downstream LTR from a genomic clone (see Example 2)confirms that this dinucleotide marks the end of the LTR. The putativeSIRE1-1 LTR (SEQ ID NO: 7) shows significant homology to the terminal 17nt of the 5′ LTR of the potato retrotransposon Tst1 (SEQ ID NO: 77). SeeFIG. 12.

[0116] An unusual feature of SIRE1-1 is the presence of a 95-bp, nearlytandem, direct repeat between nt 2096 and 2299 (FIG. 6; SEQ ID NO: 3).The repeats are separated by 3 bp. The upstream member has an 11-bpinsertion that is absent in the downstream member. Otherwise, thesequences are 95% identical. The 5% divergence makes it very unlikelythat the duplication was created during the cloning process.

[0117] The 2.4 kb cDNA sequence was aligned to the corresponding regionof Gm776, and it was found that the amplified fragment lies completelywithin the gag region of the 2.4 kb fragment, and that the two sequencesdiffer by only 2% (FIG. 22). Of the 13 bp differences, seven retain thesame amino acid. Of the remaining six, three result in the substitutionof one non-polar amino acid for another—isoleucine for phenylalanine,isoleucine for valine, and leucine for methionine—and two aresubstitutions of threonine by isoleucine. The last substitutiongenerates a stop codon in Gm776. Among the amino acid changes, only thethreonine to isoleucine substitution is not considered to be aconservative replacement. The predominance of silent and conservedsubstitutions strongly suggests that the differences reflect theslightly diverged, evolutionary relationship between two SIRE1 familymembers.

EXAMPLE 2

[0118] Isolation and Characterization of the SIRE1-1 Genomic Clone

[0119] Oligonucleotide primers (FIG. 5B; SEQ ID NOS: 15-24) wereutilized in PCR to amplify fragments from the gag and pol regions andfrom part of the adjacent LTR of the 2.4 kb cDNA clone. These amplifiedfragments and synthetic oligonucleotides (FIG. 5) were used to generategag- and LTR-specific radiolabeled probes. A λFIXII soybean genomiclibrary (Stratagene, La Jolla Calif.) was probed with radiolabeledSIRE1-1 gag probes and positively-hybridizing plaques were purified bylimiting dilution screening (Sambrook et al., 1989). DNA was preparedfrom phage recovered from liquid culture (Burmeister and Lehrach, 1996).

[0120] The phage DNAs containing the putative SIRE1 genomic clones weredigested with the restriction endonuclease Not I to release the DNAinserts from the phage. The largest DNA inserts obtained thereby weredigested with Xba I, and Southern blots of the digested DNAs were probedwith an end-labeled, LTR-specific oligonucleotide to identify clonescarrying two LTRs. Analyses of one clone yielded two hybridizing bands,indicating that this clone contained two LTRs and was a probable sourceof a full-sized, intact copy of SIRE1-1. The purified phage DNAcontaining the full-length SIRE1-1 genomic clone was deposited with theAmerican Type Culture Collection, 12301 Parklawn Drive, Rockville Md.20852 on Aug. 12, 1997 (ATCC accession number 209200) in accordance withthe Budapest Treaty requirements.

[0121] Restriction endonuclease digestion of the phage DNA with Xba Iyielded three fragments of 8.5, 6.5 and 4.2 kb. Southern hybridizationof the electrophoretically separated fragments with a radioactivelylabeled 2.4 kb SIRE1-1 cDNA probe revealed that the SIRE1-1 2.4 kb cDNAsequence extends across the 12.5 kb and 4.2 kb Xba I fragments.

[0122] The fragments were each subdloned into a pSPORT-1 plasmid (LifeTechnologies, Gaithersburg Md.) for automated DNA sequencing. Some ofthese subclones were unstable, but the one carrying the 4.2 kb Xba Ifragment that hybridized to the LTR probe, but not to the gag probe,displayed no evidence of rearrangement. Both strands of this 4.2 kbclone were sequenced on ABI Prism 377 DNA sequencers using pUC universalprimers and the oligonucleotide primers listed in FIG. 19 (SEQ ID NOS:25-38). This sequence (FIG. 13; SEQ ID NO: 8) is made available asGenBank Accession number U96295.

[0123] The 4.2 kb XbaI fragment encompasses the 3′ end of the genomicclone and contains the distal 3.7 kb of SIRE1-1 along with 538 bp ofpresumably single-copy flanking DNA (FIG. 14). Analysis and predictedtranslation of the SIRE1-1 genomic sequence revealed the presence of twoORFs (FIG. 14). The first, ORF1 (SEQ ID NO: 9 and 11; See FIG. 15A)extends from nucleotide (nt) 1 to nt 191, and is clearly the 3′ end of aretroelement ribonuclease H (RH)-encoding sequence. The 3′ terminus ofthe SIRE1-1 RH coding region exhibits significant amino acid sequencehomology (i.e., 53% identity and 87% similarity) with thecarboxy-terminus of RNase H from copia (FIG. 17). In all copia/Ty1-likeretrotransposons, the RH coding sequence is at the 3′ end of the polgene and is closely followed by a polypurine tract (PPT) and the 3′ LTR.However, the RH coding region of pol in SIRE1-1 is followed by a longORF in the region corresponding to retroviral env (see below).

[0124] The second ORF within this fragment, i.e., ORF2, extends from nt219 to nt 1958. The predicted translation product suggests that ORF2encodes a full-length, envelope (env)-like glycoprotein characteristicof animal retroviruses (FIG. 15A and 15B; SEQ ID NOs: 10 and 59 and FIG.16; SEQ ID NO: 84). Retroviral envelope proteins are synthesized from aspliced transcript in which the initiation codon is supplied by the gagregion, which for SIRE1-1 was found in the 2.4 kb cDNA clone (Example 1;SEQ ID NO: 3). The amino-terminal one-third of the SIRE1-1 env sequenceis rich in proline, serine, and threonine codons, with the latter twopossibly serving as O-glycosylation sites. There are also a small numberof asparagines in this region that might serve as N-glycosylation sites.

[0125] Although the predicted amino acid sequence of ORF2 does notexhibit significant amino acid homology with the known env proteins, itspredicted secondary structure is typical of animal retrovirus envproteins. Failure to find high amino acid homology with other retroviralproteins is not surprising, as it is likely that SIRE1-1 and the animalretroviruses diverged before either had acquired an env encoding region.

[0126] A typical retroviral env protein has a signal peptide near theamino-terminus. There is a likely hydrophobic signal peptide at codons22-43 of the SIRE1-1 env sequence (FIG. 16; SEQ ID NO: 84). Near thecarboxy-terminus of retroviral envelope proteins, a hydrophobic domainserves to anchor the molecules in the membrane such that the protein isoriented with the N-terminus outside the cell and the C-terminus withinthe cytoplasm. Codons 511 to 531 of the SIRE1-1 env sequence (SEQ ID NO:84) constitute a hydrophobic region that may provide this function (FIG.16). These assignments and the appropriate membrane orientations arestrongly supported by analysis with the transmembrane predictioncomputer program TMpredict (Hofman and Stofel, 1993) (see below).

[0127] ORF2 is 647 codons in length, and the derived, unmodifiedtheoretical protein has a molecular weight of 70 kD. Despite itslocation immediately downstream of pol, the translated env amino acidsequence does not exhibit significant sequence identity to any reportedretroviral env protein. This result is not entirely unexpected becauseknown env sequences constitute a very heterogeneous population, andpair-wise comparisons often fail to demonstrate significant sequencecongruence (Doolittle, et al., 1989; McClure, 1991). Alternatively, ORF2could be a transduced cellular sequence. For example, Bst1 from maize, alow copy-number LTR retrotransposon that lacks its own RT (Johns, etal., 1989; Jin and Bennetzen, 1989), encodes domains derived from amaize plasma membrane H-ATPase (Bureau, et al., 1994; Palmgren, 1994).

[0128] Retroviral env genes encode polypeptides that are cleaved by hostproteases into surface (SU) and transmembrane (TM) peptides,respectively, which are subsequently rejoined through disulfide linkages(Hunter and Swanstrom, 1990). While the primary sequences of theseproteins may be diverse, all retroviral env proteins are glycosylatedand share three functionally conserved hydrophobic domains: a signalpeptide near the amino terminus of SU, a membrane fusion peptide nearthe amino terminus of TM, and a distal anchor peptide (Hunter andSwanstrom, 1990).

[0129] Retroviral env glycoproteins contain between four and thirtyN-glycosylated asparagines at Asn-Xaa-Ser/Thr motifs (Hunter andSwanstrom, 1990), with SU generally more heavily glycosylated than TM.The conceptual translation product of ORF2 from SIRE1-1 has only two Asnin this context. However, retroelement env proteins are also known to beO-glycosylated at Ser and Thr residues (Pinter and Honnen, 1988).O-glycosylation is correlated with clusters of hydroxy amino acids withelevated frequencies of Pro (Wilson et al., 1991). The amino half of thetheoretical SIRE1-1 protein (corresponding to SU) conforms to thispattern, and many of the hydroxy amino acids in the carboxyl half of theprotein are adjacent to Pro. The amino acid composition of one extendedproline-rich region encompassing amino acids 60 through 127 (SEQ ID NO:83) is similar to the 60-amino acid proline-rich neutralization (PRN)domain of SU from feline leukemia virus (FeLV) (Fontenot et al., 1994).Pro makes up 18% in both and hydroxy amino acids are 20% in the FeLV PRNand 22% in SIRE1-1. Gln is 9% in FeLV and 10% in SIRE1-1, and while thePRN of FeLV contains no aromatic amino acids, the comparable SIRE1-1region contains only one. In SIRE1-1, the spacing of many of the Proresidues in this region and beyond (Xaa-Pro-Yaa)_(n) or (Xaa-Pro)_(n) ischaracteristic of many structural membrane proteins from both eukaryotesand prokaryotes (Williamson, 1994).

[0130] The putative env protein sequence was evaluated for the presenceof hydrophobic, membrane-spanning helices using TMpredict (Hofmann andStoffel, 1993). The program returned two possible transmembrane regionswith high confidence values and a third somewhat below the margin ofsignificance (FIG. 23). The first predicted helix encompasses aminoacids 22 to 43 (SEQ ID NO: 83), a typical signal peptide location. Thesecond predicted transmembrane helix extends from amino acid 510 toamino acid 530 (SEQ ID NO: 83), and corresponds to the general locationof retroviral anchor peptides. Although of questionable statisticalsignificance, the third predicted transmembrane helix, from amino acids465 to 485, is in a location that could correspond to that of viralmembrane fusion peptides.

[0131] Only two retroviral env peptides have been structurallycharacterized by X-ray crystallography (Chan et al., 1997; Fass et al.,1996), but several env SU and TM sequences have been analyzed bystructural prediction computational programs (Hunter and Swanstrom,1990; Gallaher et al., 1995; Gallaher et al., 1989). Analysis of theORF2 sequence using the computer program NNpredict (Kneller et al.,1990) suggests the presence of long α-helices and regions of β-sheets(FIG. 20) typically found in env proteins. The evaluation of ORF2 usingseveral other programs (Deleage and Roux, 1987; Georjon and Deleage,1995; Georjon and Deleage, 1994; Gibrat et al., 1987; Levin et al.,1986), yielded predictions of multiple α-helices similar to those ofcorresponding regions of other retroviral env proteins (Hunter andSwanstrom, 1990; Gallaher et al., 1995; Gallaher et al., 1989).

[0132] ORF2 (SEQ ID NO: 83) was also evaluated for the possible presenceof coiled-coils (Lupas et al., 1991). Amino acids 580 to 611 werepredicted to form a coiled-coil with very high confidence (FIG. 23). Thesequence adheres well to the heptad repeat sequence identified inseveral virus fusion peptides (Chambers et al., 1990). The predictedcoiled-coil in the TM domains of HIV and Moloney murine leukemia virushave recently been confirmed by X-ray crystallography (Chan et al.,1997; Fass et al., 1996).

[0133] Retroviral env proteins are generated from spliced transcripts(Varmus and Brown, 1989; Hunter and Swanstrom, 1990). In the case ofsome avian retroviruses, splicing leads to an in-frame fusion of the gagstart codon with the 5′ end of the env coding region (Hunter andSwanstrom, 1990), obviating the need for an initiating AUG in env. Ananalogous splice in a SIRE1-1 transcript would serve the same purpose,although no splice donor or acceptor consensus sequences are present inthe expected regions. Cleavage of env proteins into SU and TM generallyoccurs at a conserved site containing the consensus sequenceArg-Xaa-Lys-Arg (Hunter and Swanstrom, 1990). This sequence does notappear in the putative SIRE1-1 env, but there are several similarlybasic tetrapeptide candidates for such a cleavage site (FIG. 23). TheLys-Lys-Gly-Lys (SEQ ID NO: 82) at residues 439-442 would generate a TMprotein of 22.3 kD with the fusion peptide near the amino terminus. Thecorresponding SU would be 48.7 kD.

[0134] To confirm that the putative env gene was not a library orcloning artifact, and that most, if not all, genomic copies of SIRE1were organized in the same way as the clone, SIRE1-1 genomic DNA wasdigested with several restriction enzymes and a Southern blot was probedwith sequences from the env and gag subclone regions. The intensity ofhybridization of an env probe to genomic DNA was similar to that for thegag probe that had previously been used to establish the moderately highcopy number of SIRE1-1 (Laten and Morris, 1993). In addition, gag andenv probes hybridized to the same 10.5 kb HpaI fragment. Although thepossibility cannot be ruled out, this env-like ORF is probably not atransduced host gene. The presence of this ORF in most if not all of theseveral hundred copies of SIRE1 suggests that this gene is an integralpart of the retroelement genome.

[0135] Alternate splicing could result in an additional ORF extendingfrom nt 1834 to 2166, thereby encoding a 110-amino acid peptide. Suchalternate splicing of retroviral transcripts at similar sites has beenshown to lead to the production of transacting factors, which may beuseful in modulating gene expression in accordance with the presentinvention.

[0136] To identify the LTR, the DNA sequence (SEQ ID NO: 8) from the 4.2kb XbaI fragment was aligned with that from the SIRE1-1 cDNA clone (SEQID NO: 3) which contained the last 178 bp of the 5′ LTR. Sequencealignments were made using the Genetics Computer Group package (Devereuxet al., 1984). The GCG analysis confirmed that the genomic subclonecontained a 3′ LTR and fixed the location of the 3′ end of the LTR at nt3686 in the sequence AATTTCA (FIG. 3; SEQ ID NO: 8), beyond which thetwo sequences diverged. Although the region of LTR overlap was virtuallyidentical (98% sequence identity), the moderately high copy number ofSIRE1 makes it unlikely that the cDNA and genomic clones representcopies of the same element.

[0137] Upstream of the genomic LTR there are several polypurine regionsranging in length from 11 to 16 nucleotides (FIGS. 13 and 14). Suchsites are known to serve as origins for initiation of retroelementplus-strand synthesis. In addition, the SIRE1-1 LTR containsappropriately located sequences that strongly resemble consensussequences for retroviral promoter elements and polyadenylation signals.

[0138] The 538 nucleotides of flanking DNA adjacent to the 3′-end of theSIRE1-1 sequence (SEQ ID NO: 8) comprises an uninterrupted open readingframe (FIG. 14). This strongly suggests that the SIRE1-1 insertiondisrupted a functional gene. As the G. max cultivar is essentially atetraploid, its genome can accommodate some gene disruptions withoutmajor phenotypic consequences. The predicted translation product of theflanking DNA is relatively hydrophilic and is rich in asparagine andglutamine codons. No significant homology was found with known plantproteins, however.

[0139] To obtain other subclones of SIRE1-1, the genomic SIRE1-1 λFIXIIbacteriophage DNA was double-digested with Hind III (which does notdigest λFIXII DNA) and Sac I (which does digest λFIXII DNA in themulticloning region). This digest generated 10 fragments (FIG. 24). Thetwo largest fragments, 20 kb and 9 kb, respectively, are known toconstitute the lambda phage arms. The other eight fragments collectivelyconstituted 19 kb of SIRE1-1 genomic sequence. Individual digests of thegenomic clone with Hind III and Sac I, respectively, revealed that the2.1 kb and 1.5 kb fragments produced in the double digest were adjacentto the lambda phage arms (data not shown). Therefore, these twofragments each have Hind III and Sac I termini, while the other 6fragments have only Hind III termini.

[0140] Southern blot hybridizations were conducted with the Hind III/SacI double-digested SIRE1-1 DNA using probes derived from the LTR, gag,and env regions of the 4.2 kb Xba I fragment, respectively (FIG. 25).These experiments revealed that the env sequence lies within the 4.1 kbfragment (FIG. 26); the LTR regions are contained within the 4.3 kb and2.7 kb fragments; and the gag region is also contained within the 4.3 kbfragment (FIG. 27).

[0141] The 4.1 kb fragment (containing at least a portion of the envregion) and the 4.3 kb fragment (containing at least a portion of thegag region) were each subcloned into pSPORT-1 vectors and the constructswere separately transformed into DH10B E. coli cells. Recombinantplasmids were detected by restriction digestion and Southernhybridization. The vector construct comprising the 4.1 kb fragment wasnamed pEG4.1 (FIG. 28), and the vector construct comprising the 4.3 kbfragment was named pEG4.3 (FIG. 29).

[0142] The pEG4.1 construct was sequenced using M13/pUC universalprimers (pUC-forward and -reverse; SEQ ID NOS: 12, 14) and SIRE1-1specific primers [(FIG. 30;] (SEQ ID NOS: 39-49) as described above. SeeFIG. 30. Translation of the nucleotide sequence obtained thereby (FIGS.31a-c; SEQ ID NO: 50) revealed a long uninterrupted open reading frameencoding 942 amino acids (FIG. 32; SEQ ID NO: 51). The 3′ terminus ofthe 4.1 kb Hind III fragment overlapped the 5′ terminus of the 4.2 kbXba I fragment (described above, containing the env region) byapproximately 1.5 kb. Translation of the remaining 2.6 kb sequencerevealed regions exhibiting strong homologies to the integrase, reversetranscriptase, and RNase H regions of known retrotransposons.

[0143] The 4.3 kb Hind III fragment contained in pEG4.3 was partiallysequenced using pUC universal primers (REF; SEQ ID NOS: 12,14). The 5′terminal region of the 4.3 kb fragment was found to contain sequenceidentical to that of the putative 3′ LTR contained within the 3′terminal region of the 4.2 kb Xba I (env-containing) fragment (SEQ IDNO: 8). The 3′ terminal region of the 4.3 kb Xba I fragment containedsequences exhibiting strong homology to the amino-terminal region of theintegrase (int) domain of known retrotransposons.

[0144] A region encompassing 400 amino acid residues predicted from thecontiguous nucleotide sequences of the 3′-terminal region of the 4.3 kbfragment and the 5′-terminal region of the 4.1 kb fragment,respectively, appears to constitute an integrase (int) domain (SEQ IDNO: 52). The predicted amino acid sequence of this putative int domainwas compared against the BLAST-P peptide database. Significant homologywas found with copia-like retrotransposons, with the strongest homologybeing to the Opie-2 element from maize, which exhibited 39.8% identityand 58.5% similarity at the amino acid level, with three sequence gaps(FIG. 33). The putative SIRE1-1 and Opie-2 elements each contain aconserved HHCC (H—X4-H, C—X2-C) motif, which is usually found at theamino-terminus of retrotransposon integrase domains (FIG. 33). TheSIRE1-1 and Opie-2 elements also each contain a D(10)D(35)E motif (i.e.,two aspartate residues within 10 residues of each other, and a glutamateresidue within 35 residues of the pair in the carboxy-terminaldirection) (FIG. 33).

[0145] The break point between the integrase (int) and the reversetranscriptase (RT) domains of SIRE1-1 was determined by comparison ofthe 4.1 kb fragment sequence with the sequences of retroelements wherethe break point has been determined experimentally (Doolittle et al.,1989; McClure, 1991; Springer and Britten, 1993; Taylor et al., 1994;Rogers et al., 1995). The predicted amino acid sequence (SEQ ID NO: 53)of the reverse transcriptase domain extends from residue 401 to residue781. This predicted sequence was compared against the BLAST-P peptidesequence database. Significant homology was found between the putativeSIRE1-1 RT region and the RT regions of copia-like retrotransposons(FIG. 34). Again, the most significant match was to Opie-2 from maize,which exhibited 56% identity and 71% similarity at the amino acid level,with one sequence gap (FIG. 34). Several regions in which the SIRE1-1 RTexhibits near identity to that of Opie-2 encompass sequences that haveproved useful in studying the phylogenetic relationships ofretroelements (Xiong and Eickbush, 1990).

[0146] The break point between the reverse transcriptase (RT) andRibonuclease H (RH) regions of the SIRE1-1 4.1 kb fragment sequence wasalso predicted by comparison against those of known retroelements. TheRH domain of SIRE1-1 appears to encompass the predicted amino acids 782to 942. This predicted sequence (SEQ ID NO: 54) was compared against theBLAST-P peptide sequence database. Not surprisingly, the strongesthomology was found with the RH element of maize Opie-2, which exhibited53.1% identity and 71.0% similarity to the predicted SIRE1-1 RH region(FIG. 35). The SIRE1-1 RH domain also contains the DEDD motif found inthe RH elements of most known retrotransposons (FIG. 35).

[0147] These data confirm that SIRE1 is a retroviral family whosegenomic structure is based on a copia/Ty1-like organization. The genomicorganization of all animal retroviruses (from vertebrates andDrosophila) is patterned after gypsy/Ty3-like retrotransposons. Neitherretroviral genomes nor virions have been reported in plants, althoughboth classes of retrotransposons are widespread. In plants, virus spreadis mediated by intercellular movement (Mushegian and Koonin, 1993).However, very few plant virus genomes encode an env gene. Those thatdo—rhabdoviruses and bunyaviruses (Matthews, 1991)—also infect animalhosts where env proteins mediate viral-host cell membrane fusion. Plantcell walls may preclude this mode of virus transfer, and whether the envproteins of these viruses serve any function in their plant hosts is notknown. Thus, the presence of an env gene in SIRE1 suggests that SIRE1may have originally been an infectious invertebrate retrovirus.

[0148] The overall restriction site homogeneity, the presence of long,uninterrupted ORFs within and adjacent to SIRE1-1, and the near identityof the 5′ and 3′ SIRE1-1 LTRs suggest that SIRE1-1 is not anevolutionary relic, and may be modified to function as an infectiousretrovirus and/or intracellular retrotransposon.

[0149] The genomic clone may be used as a SIRE1 genomic probe. The probemay be hybridized to Southern blots of complete and partial digests ofsoybean DNA to generate a consensus restriction map (Sambrook et al.,1989). Additionally, restriction maps of additional clones and thegenomic DNA consensus may be compared to more fully assess SIRE1heterogeneity. The polymorphic sequences of clone populations may thenbe used to determine expression-related features and phylogeneticrelationships to other plant and animal elements.

[0150] The env, gag, and pol nucleotide sequences may be used togenerate oligonucleotide or cDNA probes to detect transcription of theseregions (Navot et al., 1989), and antibodies generated against SIRE1proteins may be used to detect the presence of retroviral proteinexpression in various plant tissues (Hsu and Lawson, 1991). Moreover,reverse transcriptase (RT) and integrase (int) probes may be created byrestriction digestion or PCR and used to assess the functionalsignificance of the unprecedented length of SIRE1.

EXAMPLE 3

[0151] Northern Hybridization Analysis of SIRE1 Transcriptional Activity

[0152] The use of the SIRE1-1 polynucleotide as a tool for geneticengineering may require the expression of sequences therefrom. It maytherefore be desirable to determine growing conditions under whichplants or plant cell cultures that have been infected or transduced withSIRE1-derived DNA exhibit elevated or depressed transcriptionalactivity. There are many examples in which the transcriptional activityof a virus is enhanced during periods in which its host experiencesenvironmental stress. Therefore, experiments may be conducted todetermine growth conditions (or conditions of stress) optimal for theregulation of SIRE1 expression.

[0153] The presence of SIRE1-specific transcripts in plants such assoybean may be evaluated by Northern hybridization (Sambrook et al.,1989). For example, several G. max cultivars, including the AsgrowMutable line, an unstable soybean isolate (Groose & Palmer, 1987; Grooseet at, 1983), and Glycine soja strains (from a range of origins) may begrown from seed obtained from the U.S. Regional Soybean Laboratory inUrbana, Ill.

[0154] Plants may be grown under optimal and adverse (stress) conditionsin growth chambers or in a greenhouse, and the transcriptional activityof SIRE1 in plants subjected to adverse conditions may then be comparedto that in plants grown in normal conditions.

[0155] Many potential adverse growing conditions are well-known in theart. For example, seedlings may be grown in vermiculite and subjected totemperatures ranging from 15° C. to 40° C. Plants may also be subjectedto salt stress by applying NaCl solutions ranging up to 2%, or toosmotic stress by adding solutions containing PEG 8000. Plants growingunder each or several of these conditions may be harvested at varioustimes to assess the temporal relationship of the adverse condition tothe transcriptional activity of SIRE1. To assess the impact of viralinfection, leaf tissue may be inoculated with a virus such as soybeanmosaic virus and harvested at 2, 5, 10 and 20 days after infection(Mansky et al., 1991).

[0156] In addition, the transcriptional activity of SIRE1 may beassessed in plant tissue cultures. Tissue cultures may be initiated fromroots, cotyledons, or leaves from selected cultivars as described(Amberger et al, 1992; Roth et al., 1989; Shoemaker et al., 1991).Tissue can then be transferred to Petri plates containing Gamborg's B5medium supplemented with kinetin, casein hydrolysate and concentrationsof 2,4-D ranging from 1 to 20 μM. After the formation of callus,suspension cultures may be initiated and maintained in liquid medium(Roth et al., 1989). These cultures may then be exposed to adversegrowing conditions as described above.

[0157] Total RNA may be isolated from seeds, cotyledons, leaves, roots,shoot tips, or cultured cells using commercial kits such as RNeasy™(Qiagen, Chatsworth, Calif.). If necessary, polyadenylated RNA may beisolated from total RNA using the PolyATtract™ mRNA isolation system(Promega, Madison, Wis.). Isolated RNA may then be applied to nylonmembranes (Gene Screen Plus, New England Nuclear, Boston, Mass.) using aslot-blot apparatus, denatured, and probed with end-labeled oligomers orradiolabeled cDNAs corresponding to the gag or pol regions of SIRE1-1(Sambrook et al., 1989). RNA samples that give positive signals may befractionated on 1% agarose-formaldehyde gels, blotted to nylonmembranes, and probed as above. Preliminary studies of SIRE1 RNAtranscripts in G. max (using the slot-blot procedures described above)have revealed the presence of high levels of gag transcripts in leaftissues.

[0158] As retro-elements commonly produce polyprotein-encodingtranscripts that traverse nearly the entire element, functional SIRE1transcripts could exceed 10 kb in length. This could limit theapplicability of agarose-formaldehyde gel separations. Alternatively,isolated RNA can be analyzed for the presence of SIRE1 transcripts byribonuclease (RNase) protection assays well-known in the art. Forexample, RNA isolated from plants grown in the above-describedconditions can be hybridized to SIRE1-derived radiolabeled RNA probe insolution and then exposed to one or more of several available RNases.The double-stranded hybrid formed by the probe and target RNA isprotected from RNase digestion. The protected RNA can be fractionated ona denaturing polyacrylamide gel, blotted to a nylon membrane, andvisualized by autoradiography.

EXAMPLE 4

[0159] Detection of Retroelement Proteins by Western HybridizationAnalysis

[0160] Plant tissue samples that contain SIRE1-specific transcripts maybe analyzed for the presence of SIRE1-specific proteins or for proteinsexpressed by heterologous genes inserted into a SIRE1 derived vector.Protein recovered from these tissues may be spotted on nylon membranesand assayed for the presence of nucleocapsid, protease, and RTpolypeptides by Western hybridization (Sambrook et al., 1989).

[0161] Polyclonal antisera against SIRE1 proteins (or fusion constructscontaining SIRE1 and heterologous peptide sequences) to be detected inthese hybridizations can be obtained using methods well-known in theart. For example, oligopeptides may be designed and synthesized usingsequence information from the cDNA and genomic clones. The syntheticoligopeptides may be coupled to carrier protein using for examplegluteraldehyde, and antibodies against these raised in rabbits andaffinity-purified as is well-known in the art (Harlow and Lane, 1988).

[0162] Alternatively, polyclonal antisera may be raised against fusionproteins produced by inserting the appropriate SIRE1 DNA fragments (orDNA encoding the heterologous proteins) in a protein expression vectorlike pPROEX-1 (Life Technologies, Gaithersburg, Md.) and isolating thefusion protein according to the manufacturer's instructions.

[0163] Monoclonal antibody preparations against SIRE1 proteins or fusionproteins may also be isolated from hybridoma cells derived fromsplenocytes or thymocytes of mice immunized with such proteins accordingto methods well-known in the art (Harlow and Lane, 1988).

EXAMPLE 5

[0164] In Vitro Transcription and Translation of SIRE1 Transcripts

[0165] It may be desirable to produce SIRE1 polypeptides in vitro foruse in producing antibodies or for capsid reconstitution studies and toprovide reagents for in vitro packaging of retroviral polynucleotides.Production of SIRE1 polypeptides in a cell-free environment may beaccomplished by creating cDNAs from SIRE1 mRNA transcripts, insertingthose cDNAs into plasmids, propagating the plasmids, and utilizing suchplasmids in in vitro transcription/translation reactions as arewell-known in the art. cDNAs may be recovered from full-length SIRE1transcripts isolated from soybean total or poly-A-selected RNA. SuchcDNAs may be produced using reagents and reactions optimized for longtranscripts (Nathan et al., 1995). Total or poly-A-selected soybean RNAmay be reverse-transcribed with SuperScript II™ reverse transcriptase(Life Technologies, Gaithersburg, Md.) using an oligo(dT) primer. RNaseH may be added and the single-stranded cDNA amplified using LA Taq DNApolymerase (Oncor) with oligo(dT) and 5′ primers derived from theproximal end of the SIRE1-1 gag and/or env cDNA sequences. The 5′ end ofeach PCR primer may contain a restriction enzyme recognition sequencefor subsequent vector ligation in the appropriate orientation andsequences that would facilitate enhanced transcription and/ortranslation.

[0166] Amplified cDNAs may be initially characterized by agarose gelelectrophoresis and Southern hybridization using gag-, pol- andenv-specific cDNA or oligonucleotide probes. The amplified DNAs may beligated into pSPORT-1 (Life Technologies, Gaithersburg, Md.), a vectordesigned to carry large inserts, and the recombinant plasmids used totransform competent E. coli DH5α cells (Life Technologies, Gaithersburg,Md.). Plasmid DNA may be recovered from transformants and evaluated byrestriction mapping and Southern hybridization as described above.Selected regions of several cDNAs may be sequenced with primers based onthe sequence obtained from the genomic SIRE1-1 clone. cDNA variabilitymay be assessed and quantitatively compared to that observed with Tnt1transcripts in tobacco, which constitute a quasispecies-like collection(Casacuberta et al., 1995). The transcriptional initiation site(s) maybe evaluated by primer extension and/or S1 nuclease digestion (Sambrooket al., 1989).

[0167] Alternatively, a parallel series of experiments may be run togenerate translatable mRNAs. SIRE1-specific cDNAs may be generated asabove, except that the 5′ PCR primer may be derived from the beginningof the gag and pol coding regions. The cDNA sequence suggests that asingle gag-pol ORF may not be present in SIRE1-1, and translation of thedownstream pol region requires read through of a stop codon and/or aframe shift. It is probable that the ribosomes in the in vitrotranslation system may not emulate the in vivo translation. Forexpression of the pol region, the cDNAs may be amplified using a 5′primer derived from the proximal end of the pol ORF.

[0168] Plasmid DNAs containing SIRE1 cDNAs may be recovered, and coupledin vitro transcription-translation assays may be run (Switzer andHeneine, 1995) using a reticulocyte lysate system (Promega, Madison,Wis.). Translation products may be analyzed by SDS-PAGE and Westernhybridization as described above.

[0169] As an alternative to coupled in vitro transcription andtranslation, SIRE1 cDNAs may be cloned into the protein expressionvector pPROEX-1 (Life Technologies, Gaithersburg, Md.), and fusionproteins expressed in E. coli and recovered as described by themanufacturer. SIRE1 cDNAs utilized in the above-mentioned reactionscould include those encoding analogs, homologs, or fragments of thefull-length SIRE1 gag, pol, or env proteins. These proteins, althoughnot identical to proteins encoded by the SIRE1-1 polynucleotidesdisclosed herein, may nevertheless be useful if they retain at least onebiological property of SIRE1 proteins. Such proteins may be used forantibody generation as described above, or for subsequent proteinconformation studies.

EXAMPLE 6

[0170] Modification of SIRE1 for Use in Non-Replicative Transduction ofPlant Cells

[0171] SIRE1 may be adopted for use as a retroviral vector in legumes,e.g., soybean, common beans, and alfalfa, cereals, e.g., nice, wheat,and barley, and other agronomically important crops such as fruit trees,conifers, and hardwoods. The use of a plant retrovirus for introductionof DNA sequences into plant cells presents several advantages overpreviously-known methods. First, unlike other plant viral vectors (Joshiand Joshi, 1991; Potrykus, 1991), the SIRE1 pro-retrovirus may integrateinto the host genome and generate stable transformants (Crystal, 1995;Miller, 1992; Smith, 1995).

[0172] Second, although other vectors have been used to introducenucleic acid into plant genomes, they have serious limitations. Forexample, Ti plasmid-based vectors lead to integrative transformation,but their bacterial host, Agrobacterium tumefaciens, has a limited hostrange that does not include many legumes or most cereals (Christou,1995; Potrykus, 1991).

[0173] Finally, physical transformation methods (i.e., biolisticprojection or microinjection) are far less efficient than viralinfection in introducing DNA constructs into desired cells. Thesephysical methods also generally require regeneration of adult plants bysomatic embryogenesis (Christou, 1995; Potrykus, 1991).

[0174] A full-length SIRE1 pro-retroviral DNA and vectors derivedtherefrom will be competent to effect transduction into plant host cellsand integration into the host genome, using any of the foregoingmethods. However, it may be desirable to modify SIRE1 vectors so as tolimit the region of integration, to restrict subsequent transpositionevents, to add DNA sequences to promote homologous recombination betweena vector and a target region of the genome, and to insure againstinfectious spread of a potentially pathogenic agent.

[0175] SIRE1 may be modified in a manner analogous to that used forvertebrate retroviruses to create recombinant viral vectors that mayinfect host cells but not complete an infection cycle. For vertebrateretroviral vectors, this is accomplished by deleting or disabling thetrans-acting elements (i.e., gag, pol, and env) from the vector to betransduced into the host cell, while leaving intact the cis-actingelements (i.e., LTRs and packaging signals). This is followed bytransduction of the modified vector into retrovirus packaging cell linesor tissue cultures (Miller, 1992; Smith, 1995) that may contribute thenecessary trans-acting elements.

[0176] Thus, the present invention contemplates SIRE1 constructs inwhich sequences encoding the trans-acting factors (e.g., gag, pol, andenv), the LTRs, or the packaging signals have been mutated or deleted,either singly or in combination. Mutations may be easily accomplishedusing PCR-mediated site-directed or cassette mutagenesis techniques asare well-known in the art.

[0177] The trans-factor encoding sequences may be deleted by digestionof the SIRE1-1 viral DNA with appropriate restriction enzymes. Those ofordinary skill in the art will be readily able to determine theappropriate restriction enzyme recognition sites in the SIRE1 DNA thatwill allow for removal of the appropriate trans-factor DNA segmentswhile leaving intact essential cis element sequences. One approach wouldbe to digest the SIRE1 DNA with a restriction enzyme that would cleaveat sites located at or near the 5′ and 3′ boundaries of the ORF2 region(FIG. 14) such that all or part of the env-encoding region could beremoved from the vector.

[0178] Restriction digestion may be followed by recovery andpurification of the digested vector DNA fragments containing cis factorsequences, followed by religation of the digested termini (Sambrook etal. 1989). Alternatively, appropriate double-stranded DNA linkers may beligated to the digested ends of the vector DNA in order to maintain orcreate a proper reading frame. As another possibility, linker sequencescontaining one or more endonuclease restriction enzyme recognition sitesmay be ligated to the ends of the digested vector DNA, and these endsthen religated in order to facilitate subsequent insertion ofheterologous gene sequences.

[0179] Infection of packaging cells or tissue cultures with the modifiedSIRE1 vector may allow for the recovery and use of a non-replicativerecombinant vector in a functional virion particle that may be capableof intercellular transport (for example, through plasmodesmata), hostcell penetration, nuclear targeting, and chromosomal integration, butincapable of further transposition. Reporter genes like GUS(β-glucuronidase, Jefferson et al., 1981) or Npt-II (Neomycinphosphoryltransferase, Pridmore, 1987) and others (Croy, 1994) may alsobe incorporated into SIRE1 or vectors derived therefrom to allowdetection of integration events.

EXAMPLE 7

[0180] Production of Plant Retroviral Packaging Cells

[0181] Modification of pro-retroviruses for use as vectors is fairlystraightforward. In essence, retroviral vectors are simple, containingthe 5′ and 3′ LTRs, a packaging sequence, and a transcription unitcomposed of the recombinant gene or genes of interest and appropriateregulatory elements which include LTRs but which may also includeheterologous regulatory elements. To grow the vector, however, themissing trans-factors must be provided using a so-called packaging cellline. Such a cell is engineered to contain integrated copies of gag,pol, and env, but to lack a packaging signal so that no “helper virus”sequences become encapsidated. Additional features may be added to orremoved from the vector and packaging cell line to render the vectorsmore efficacious or to reduce the possibility of contamination by“helper virus.”

[0182] A packaging cell line is produced by means of transfection of ahelper virus plasmid encoding gag, pol, and env and by selecting forcells that express the proteins and that can support vector production(Miller, 1990). To avoid replication of helper sequences, one may makedeletions in, for example, the packaging signal regions. To avoidrecombination between the packaging vector and the replicating vector,the 3′ LTR is commonly deleted and replaced with a polyadenylationsequence (Dougherty et al., 1989). Deletions may also be incorporatedinto the 5′ LTR to reduce its ability to replicate, and a heterologouspromoter may be inserted downstream to maintain expression of thetrans-factors (Miller, 1989). Finally, the viral genome may be splitinto two transcription units, one encoding gag and pol and a secondencoding env (Markowitz, 1988). The cis-acting factors may be deleted ormodified from these vectors in order to prevent production ofreplication-competent retrovirus by the packaging cells.

[0183] The trans-acting factors encoded by the helper virus constructmay include the native factors from SIRE1, modified SIRE1 factors, orother proretrovirus-derived factors that may result in an increased oralternative host range or higher efficiency of viral production ortransduction efficiency (Smith, 1995). Thus, the present inventionencompasses vectors containing sequences encoding the transactingfactors from SIRE1, either singly or in various combination, for use increating packaging cells, and the packaging cells themselves.

[0184] To manipulate target cell specificity, the env gene of the helpervirus/packaging cell line may be varied. A successful approach has beento remove sequences from the env gene and replace them with sequencesencoding proteins with a different specificity (Russell et al., 1993).For example, erythropoietin sequences have been incorporated intomammalian retroviruses to target the EPO receptor (Kassahara et al.,1994). Another approach has been to incorporate a single-chain antibodyinto the env sequence (Chu et al., 1994). Finally, the ability ofretroviruses to incorporate glycoproteins from other viruses into theirenvelope has been utilized to produce so-called pseudotypes (Dong etal., 1992). The pseudotype retrovirus acquires the infective range ofthe glycoprotein donor, and usually is more stable as well. Analogousstrategies may be used in SIRE1 retroviral vectors to manipulate thehost range beyond soybean by inserting into the SIRE1 env gene ligand-,receptor-, or single-chain antibody-encoding fragments that couldrecognize, or be recognized by, proteins from other plant species, suchas rice or maize.

EXAMPLE 8

[0185] Transduction of the SIRE1-1 Plant Proretrovirus into Plant Cells

[0186] If the SIRE1 proretrovirus or vectors derived therefrom integrateinto the genome of a cell transduced with such DNA, all cells derivedfrom the original cell transfected with the SIRE1 vector may contain theretroviral insertion. Infections are commonly targeted to embryonic,meristematic, or germ line cells to enable transmission to progenyplants. Since certain plants (such as G. max) are self-fertilizing,transfection of embryos or meristematic tissue may lead to homozygosityof inserted DNA in some F₁ offspring, although the proportion of seedhomozygous for a particular insertion event may need to be empiricallytested. Dominant changes may be manifested in heterozygous progeny.Transfection of various adult tissues, especially meristems and ovaries,or seeds, pollen, protoplasts, or callus, may be performed by standardinoculation and/or co-incubation techniques which are well known(Potrykus, 1991). Viruses may also be inoculated into phloem fortransport to distant sites. In some cases, physical methods such asbiolistic projection, microinjection, or macroinjection may be necessaryor preferred to transduce SIRE1-1 into plant cells or tissues (Draperand Scott, 1991; Potrykus, 1991).

EXAMPLE 9

[0187] Use of SIRE1 as a Gene Transfer Vector

[0188] SIRE1 may be modified to carry useful gene sequences (e.g., genesequences encoding useful proteins) or, alternatively, genes to produceantisense transcripts against undesirable endogenous sequences or tointroduce into the genome gene regulatory elements which may regulatetranscription of an adjacent gene. This may be easily accomplished byrestriction enzyme digestion of the vector DNA at sites near the 5′ and3′ boundaries of the ORFs encoding the gag, pot, and/or env proteins (asdescribed above), isolating the remaining vector DNA, and eitherligating a heterologous DNA fragment between the digested vector terminior alternatively by recombinantly inserting a multicloning site(Sambrook, et al., 1989) between the digested vector termini to allowfor subsequent facile restriction enzyme digestion and recombination ofdigested vector and heterologous DNAs. Heterologous gene sequences maybe operably linked to (heterologous) host-cell specific promotersequences (Waugh and Brown 1991), or their transcription may be drivenby the SIRE1 LTR promotor activity. The heterologous gene sequences mayencode any of a variety of polypeptides whose expression may result inuseful phenotypic changes of the host cell and plant. By way of example,introduction and expression of these heterologous gene sequences inplants may result in the generation of the following exemplaryphenotypic variations:

[0189] A. Disease Resistance

[0190] Many agronomically important crops are susceptible to a varietyof diseases, viral infections, and bacterial or fungal infestations.Resistance to these conditions results in higher crop yields anddecreased use of bacteriocidal and fungicidal compositions. Transfer ofgenes conferring resistance to diseases and/or viral or bacterialinfection is an object of the present invention.

[0191] Many plant genomes, including soybean, are currently being mapped(Keim et al. 1996). In addition, genetic loci associated with diseaseresistance have been identified in many plant lines. For example,resistance markers and quantitative trait loci (QTL) for many soybeandiseases have been linked to restriction fragment length polymorphism(RFLP), RAPD (Randomly Amplified Polymorphic DNA), and STS (Sequence TagSites) genome markers. These include bacterial blight, downy mildew(Bernard and Cremeens, 1971), phytophthora root rot (Diers et al. 1992),powdery mildew (Lohnes and Bernard, 1992), soybean root-knot nematodeinfection (Luzzi et al. 1994), phomopsis seed decay, cyst nematodeinfection (Baltazar and Mansur 1992; Boutin et al. 1992; Rao-Arelli etal. 1992; Young 1996), soybean mosaic virus (Chen et al. 1993), soybeanrust (Hartwig and Bromfield 1983), stem canker (Bowers et al. 1993;Kilen and Hartwig 1987), sudden death syndrome (Prabhu et al. 1996),purple seed stain and leaf blight, and brown spot disease.

[0192] Both YAC (yeast artificial chromosome) and BAC (bacterialartificial chromosome) soybean libraries have been constructed (Funk andColchinsky, 1994), and resistance markers have been assigned toparticular clones in these libraries. The availability of these genesequences will allow for insertion of DNA fragments encoding such genesinto SIRE1 proretrovirus-derived vectors of the present invention usingstandard recombinant techniques as have been described above (Sambrooket al., 1989). The recombinant vector may then be transduced into targetplant cells, where the resistance gene may be expressed episomally orfollowing integration of the vector into the host plant genome.

[0193] Transfer of resistance to viral infection to target plant cellsis an important object of the present invention. The expression of aviral coat protein in a plant has been shown to diminish the ability ofthe virus to subsequently infect the plant and spread systemically; thusviral resistance may be mediated by vector-sponsored transfer of viralgene sequences into susceptible plant hosts (Beachy, 1990; Fitchen andBeachy, 1993). Many different viral coat protein genes have beenintroduced into plant genomes, expressed, and found to confer viraltolerance, including tobacco mosaic virus, cucumber mosaic virus,alfalfa mosaic virus, tobacco streak virus, tobacco rattle virus, potatoviruses X and Y, and tobacco etch virus (Beachy, 1990; Gasser andFraley, 1989; Golemboski et al., 1990; Hemenway et al., 1988; Hill etal., 1991). This approach to viral resistance is especially promising,as the introduction of a viral coat protein from one virus using thevectors of the present invention may often confer tolerance to a rangeof seemingly unrelated viruses (Beachy, 1990). Moreover, transgenicplants expressing viral coat proteins exhibit viral tolerance in thefield as well as in a laboratory setting (Nelson et al., 1988).

[0194] Plants may also be transformed with a retroviral vector encodingan antisense RNA complementary to a plant virus polynucleotide.Expression of antisense RNA against viral sequences may providetolerance against the virus by interfering with either the translationof viral mRNAs or the replication of the viral genome. Expression ofantisense RNA has been found to confer viral resistance in, amongothers, potato, tobacco, and cucumber plants (Beachy, 1990; Day et al.,1991; Hemenway et al., 1988; Rezaian et al., 1988).

[0195] Using the present invention, DNA fragments encoding viral coatproteins or antisense RNA complementary to viral RNA transcripts may berecombinantly inserted into the SIRE1 proretrovirus, transduced intosusceptible plants, and expressed to confer resistance to a virus.

[0196] B. Herbicide Tolerance

[0197] The use of herbicides is limited in part by their toxicity tocrop species and by the development of resistance in “weed” species(Hathaway, 1989). Increasing tolerance to herbicides may increase yieldand augment the spectrum of herbicides available for use to curtail weedgrowth. A wider range of suitable herbicides may also retard thedevelopment of resistance in weed species (LeBaron and McFarland, 1990),thereby decreasing the overall need for herbicides. Herbicide classesinclude, for example, acetanilides (e.g., alachlor), aliphatics (e.g.,glyphosphate), dinitroanilines (e.g., trifluralin), diphenyl esters(e.g., acifluorfen), imidazolinones (e.g., imazapyr), sulfonylureas(e.g., chlorsulfuron), and triazines (e.g., atrazine).

[0198] Two general approaches may be taken in engineering herbicidetolerance: one may alter the level or sensitivity of the target enzymefor the herbicide (such as by altering the enzyme itself, or bydecreasing the level or activity of a herbicide transporter), orincorporate or increase the activity of a gene that will detoxify theherbicide (Hathaway, 1989; Stalker, 1991).

[0199] An example of the first approach is the introduction (using thevectors and viruses of the present invention) into various crops ofgenetic constructs leading to overexpression of the enzyme EPSPS(5-enolpyruvylshikimate-3-phosphate synthase), or isoenzymes thereofexhibiting increased tolerance, which confers resistance to the activeingredient in the widely-used herbicide Roundup™, glyphosphate (Shah etal., 1986). The gene for EPSPS was isolated from glyphosphate-resistantE. coli, given a plant promoter, and introduced into plants, where itconferred resistance to the herbicide. Transgenic species carryingresistance to glyphosphate have been developed in tobacco, petunia,tomato, potato, cotton, and Arabidopsis (della-Cioppa et al., 1987;Gasser and Fraley, 1989; Shah et al., 1986).

[0200] Similarly, resistance to sulfonylurea compounds, the activeingredients in Glean™ and Oust™ herbicides, has been produced by theintroduction of site-specific mutant forms of the gene encodingacetolactate synthase (ALS) into plants (Haughn et al., 1988).Resistance to sulfonylureas has been transferred using this method totobacco, Brassica, and Arabidopsis (Miki et al., 1990).

[0201] Bromoxynil is a herbicide that acts by inhibiting photosystem II.Rather than attempting to modify the target plant gene, resistance tobromoxynil has been conferred by the introduction of a gene encoding abacterial nitrylase, which can inactivate the compound before itcontacts the target enzyme. This strategy has been used to conferbromoxynil resistance to tobacco plants (Stalker et al., 1988).

[0202] Genes encoding wild-type or mutant forms of endogenous plantenzymes targeted by herbicide compounds, or enzymes that inactivateherbicide compounds, may be recombinantly inserted into SIRE1 or vectorsderived therefrom and transduced into plant cells. The genes may then beexpressed under the control of plant- or tissue-specific promoters(Perlak et al., 1991) to confer herbicide resistance to the transformedplant. The overexpression of normal or mutant forms of enzymes normallypresent in the wild-type progenitor plant is preferred, as this maydecrease the probability of deleterious effects on crop performance orproduct quality.

[0203] 1. Insect Resistance

[0204] Transduction of functional genes encoding insecticidal productsinto plants may lead to crop strains that are intrinsically tolerant ofinsect predators. Such plants would not have to be treated withexpensive and ecologically hazardous chemical pesticides. In addition,such insecticides would be effective at much lower concentrations thanexogenously applied synthetic pesticides, and because biologicalinsecticides are very specific, they are generally not hazardous to thefood consumers.

[0205] Insect resistance in plants is generally provided by toxins orrepellents (Gatehouse et al., 1991). Using the present invention,insecticidal protoxin genes derived from, for example, severalsubspecies of Bacillus thuringiensis (Vaeck et al., 1987), may betransduced into plant cells and constitutively expressed therein. Thisprotoxin does not persist in the environment and is non-hazardous tomammals, making it a safe means for protecting plants. The gene for thetoxin has been introduced and selectively expressed in a number of plantspecies including tomato, tobacco, potato, and cotton (Gasser andFraley, 1989; Brunke and Meussen, 1991).

[0206] The trypsin inhibitor protein from cowpea is also an effectiveinsecticide against a variety of insects: its presence restricts theability of insects to digest food by interfering with hydrolysis ofplant proteins (Hilder et al., 1987). As the trypsin inhibitor is anatural plant protein, it may be expressed in plants without adverselyaffecting the physiology of the host. There are several potentialdrawbacks to the use of the cowpea trypsin inhibitor, however. Relativeto the B. thuringiensis toxin, higher concentrations of inhibitor arerequired for insecticidal effectiveness (Brunke et al., 1991). Thus,production of the inhibitor may require a more powerful transcriptionalpromoter (Perlak et al., 1991), and may be more energetically costly forthe host plant. In addition, the inhibitor is active in mammaliandigestive systems unless inactivated prior to consumption. Inactivationmay be accomplished by heating, however, so this may not be asignificant drawback to the use of the inhibitor in most crop plants.Moreover, in most crops, the expression of the inhibitor may berestricted to those plant tissues such as leaves or roots that are mostexposed to insect predators but are not consumed by mammals through theuse of tissue-specific promoter sequences operably linked to theinhibitor gene (Perlak et al., 1991).

[0207] These exemplary genes conferring insect resistance or repellencemay be inserted into SIRE1 proretrovirus derived vectors usingrecombinant methods well-known in the art. These recombinant vectors maythen be transduced into soybean and other plants. As more insectresistance and repellence genes are identified, these may berecombinantly inserted into the SIRE1-derived gene transfer vector andexpressed in host plants.

[0208] C. Enhanced Nitrogen Fixation and/or Nodulation

[0209] Genes whose expression contributes to greater nitrogen fixationand nodulation (Gresshoff and Landau-Ellis, 1994; Qian et al. 1996) maybe overexpressed in plant cells by transduction of a recombinant SIRE1vector containing DNA fragments from which those genes may be expressed.Alternatively, expression of those genes whose expression leads toreduced nitrogen fixation or nodulation (Wu et al. 1995) may bemodulated by the SIRE1-mediated expression of recombinantly inserted DNAfragments encoding antisense transcripts. Manipulation of these genesmay lessen or obviate the current great need for nitrogen-basedfertilizers.

[0210] 1. Enhanced Vigor and/or Growth

[0211] Genes from wild progenitor species or non-related species whoseexpression results in economically valuable growth traits often found inwild progenitor species or non-related species have been discovered(Allen, 1994; Takahashi and Asanuma, 1996). Such genes or gene fragmentsmay be placed under the control of heterologous or native promoters tocreate a gene cassette, and such cassettes may be recombinantly insertedinto SIRE1 or vectors derived therefrom. These recombinant vectors maythen be transduced into plant cells, where expression of the proteinsencoded by such genes may lead to the development of plant phenotypesexhibiting economically valuable growth characteristics.

[0212] 2. Altered Seed Oil/Carbohydrate/Protein Production

[0213] Markers have been identified for several genes associated withsoybean seed protein and oil content (Lee et al. 1996; Moreira et al.1996). Transduction and expression of these genes within plants mayresult in greater seed oil production with lowered linolenic acidcontent, enhanced seed storage protein production, diminishedraffinose-derived oligosaccharide levels, decreased lipoxygenase levels,or decreased protease inhibitor content (which may decrease thenutritive value of some plant proteins in animal feed due to decreasedhydrolysis in the digestive tracts of animals). Such genes may berecombinantly inserted into SIRE1 proretrovirus or vectors derivedtherefrom, and the recombinant virus or vector may then be used tointroduce such genes into plants or plant cells where they may beexpressed and may influence the plant phenotype.

[0214] The potential food value of certain grains may be improved byaltering the amino acid composition of the seed storage proteins. Thismay be accomplished in at least two ways. First, genes encodingheterologous seed storage proteins composed of a more desirable aminoacid mix may be transferred into plants using the vectors and methods ofthe present invention with an undesirable seed storage protein aminoacid composition. This approach has been utilized in several modelstudies: an oleosin gene from maize was successfully transferred andexpressed in Brassica (Lee et al., 1991), and a phaseolin gene from alegume was expressed, and the seed storage protein was appropriatelycompartmentalized, in tobacco plants (Altenbach et al., 1989).

[0215] Second, genes encoding endogenous seed storage proteins may bemutated to contain a more desirable amino acid composition andreintroduced into the host plant using the vectors of the presentinvention (Hoffman et al., 1988). The effect of these amino acidsubstitutions on protein conformation and compartmentalization may belessened by targeting the substitutions to the hypervariable regionsnear the carboxy-terminus of most seed storage proteins (Dickinson etal., 1990). Genes encoding proteins with altered amino acid compositionsmay be incorporated into the SIRE1 retroviral or vectors derivedtherefrom, and the recombinant virus or vector may then be used tointroduce the genes into plant cells in order to introduce changes inprotein amino acid composition.

[0216] D. Heterologous Protein Production

[0217] The present invention contemplates recombinant SIRE1-1 virus orvectors derived therefrom that may be used to introduce genes encodingtechnical enzymes, heterologous storage proteins, or novelpolymer-producing enzymes, thus allowing crops to become a novel sourcefor these products.

EXAMPLE 10

[0218] Use of SIRE1-1 to Induce and Tag Mutations in a Plant Genome

[0219] An important object of this invention is the use of the SIRE1proretrovirus to establish new landmarks in plant genomes, and to induceand trace new mutations. SIRE1 may be used to link mutagenesis andelement expression. Somaclonal variation has been demonstrated forsoybean (Amberger et al., 19921—Freytag et al., 1989; Graybosch et al.,1987; Roth et al., 1989), for example, but little is known about theagents that induce the heritable changes. Persons of ordinary skill inthe art will be able to identify new SIRE1 insertion sites in plantgenomes and to correlate these new sites with variant phenotypes.Homozygosity at insertion sites may theoretically be achieved in the F₁progeny, while dominant insertions may be differentiated frompre-existing integration events if the active element possesses areporter gene like GUS or Npt. Phenotypes may then be correlated withthe newly tagged genomic sites, and sequences flanking the sites may beeasily cloned and sequenced (Sambrook, et al., 1989).

[0220] SIRE1 may also be used to investigate the relationship between“genomic stress” and transposable element activity by seeking clues inthe LTR regions to the identity of host proteins that might regulateelement expression. The presence and expression of these proteins maythen be correlated with the adverse conditions known to induce elementexpression.

[0221] The availability of a functional proretrovirus in a major plantgroup has far-ranging applications to applied genetic manipulations andto basic biological problems concerning gene function, genomeorganization, and evolution. A better understanding of these issues maybe valuable in identifying and mapping important new loci. Understandingthe relationships between plant health and element mobilization mayprovide invaluable insights into short- and long-term consequences oftransposition. If retroelements have played a significant role inadaptive mutation in natural populations, then plant geneticists may beable to accelerate and direct the process to generate new resistantalleles. New insertion sites would be “tagged” by the element and it maybe possible to distinguish these sites from pre-existing loci bycompetitive hybridization schemes. It should then be possible to cloneand characterize the disrupted loci. In addition, if the element hascontributed to genotypic changes that have persisted under the pressureof selection, then important loci may be closely linked to the element,a feature that may make it easier to map and isolate coding regions byelement-anchored polymorphisms.

EXAMPLE 11

[0222] Modification of SIRE1-1 Vectors to Effect Directed Integration

[0223] Retroviral integration systems show little target sitespecificity, and random insertions into a target cell genome may haveundesirable consequences: integration near cellular proto-oncogenes maylead to ectopic gene activation and tumor production (Shiramazu et al.,1994), and random integration may also inactivate essential or desirablegenes (Coffin, 1990). Therefore, the ability to direct the integrationof a plant proretrovirus to a limited region of a target plant cellgenome is very desirable.

[0224] One manner by which directed integration may be effected is via“tethering” of the integration machinery to a specific target sequence.This may be accomplished by fusion of a sequence-specific DNA-bindingdomain to the integrase sequence of the SIRE1 proretrovirus (Kirchner etal., 1995). The nucleotide sequence encoding the DNA-binding domain froma protein known to bind to a specific locus in the genome of a plant(i.e., a transcriptional enhancer for a gene whose expression iscommercially disadvantageous) may be recombinantly inserted in-frame andjust downstream from the 3′ end of the SIRE1 nucleotide sequenceencoding the carboxy-terminus of the pol region (i.e., at thecarboxy-terminus of the integrase protein, which is a product of polcleavage). The DNA-binding domain may then act to “guide” the integraseprotein and the SIRE1 polynucleotide to the genetic locus to beinsertionally mutated by SIRE1.

EXAMPLE 12

[0225] Determination of the SIRE1-1 Insertion Site in the Soybean Genome

[0226] The sequence of the flanking genomic DNA from the SIRE1 genomicclone may be used to generate probes for determination of the genomicinsertion site. Restriction enzyme digests of genomic DNA from a varietyof G. max cultivars, G. soja, and other plant species (for example, G.tabacina, G. canescens, and G. tormentella) will be electrophoreticallyfractionated on agarose gels, transferred to nylon membranes, andhybridized with the flanking DNA probe(s). If a band to which theprobe(s) hybridize is polymorphic, the relation of the polymorphism tothe presence of a SIRE1 insert may be determined by hybridization with aSIRE1 LTR-specific probe. A SIRE1-related polymorphism among cultivarswould strongly support functional transposition of the SIRE1 family inthe recent past.

[0227] The above examples support that conclusion that SIRE1 is anendogenous family of proretroviruses whose genomic structure is based ona copia-like organization. In contrast, the genomic organization of allanimal retroviruses (from vertebrates and Drosophila) is patterned aftergypsy-like retrotransposons. Thus, SIRE1-1 is clearly a plant retroviralelement that is evolutionarily far diverged from animal retroviruses.

[0228] Neither retroviral genomes nor virions have been reported inplants, although both classes of retrotransposons are otherwisewidespread in nature. Therefore, SIRE1 is the first known plantproretrovirus. Few plant virus genomes encode an envelope protein. Thosethat do—rhabdoviruses and bunyaviruses—also infect animal hosts whereenvelope proteins sponsor viral-host cell membrane fusion. It is notknown whether plant cell walls would preclude this mode of transfer.

[0229] SIRE1 may originally have been an invertebrate retrovirus. Itsability to integrate into plant genomes and the presence of envelopeprotein-encoding regions suggests the possibility that at one time itmay have served as a “shuttle vector” between and among animal and planthosts., Judging by its copy number it has clearly been successful in G.max.

[0230] The overall restriction site homogeneity of family members, thepresence of long, uninterrupted ORFs within and adjacent to theretroviral insert, the strong homologies of the env, gag, int, RT and RHdomains to those from known retrotransposons, and the near-identity ofthe LTRs indicate that SIRE1 is not an evolutionary relic, but an activeproretrovirus. As such, it may be utilized to influence the organizationand expression of soybean and possibly other plant genomes.

EXAMPLE 13

[0231] DNA Sequence of SIRE1-7, SIRE1-8 and SIRE1-9

[0232] Because SIRE1-1 is unique among plant retrovirus-like elements inthat its coding information does not appear to contain obvious mutations(Laten, Majumdar, and Gaucher 1998), a survey of additionalretroviral-like elements was conducted to assess sequence diversitywithin the SIRE1 family.

[0233] Clones containing SIRE1 sequences were recovered from a λ genomiclibrary (Stratagene) by plaque hybridization (Sambrook, Fritsch, andManiatis 1989) using a probe encompassing the integrase (IN) and reversetranscriptase (RT) coding regions, and most of the env-like gene fromSIRE1-1 (Laten, Majumdar, and Gaucher 1998). DNAs were isolated fromplate lysates (Qiagen) and amplified by standard protocols usingrecombinant Taq DNA polymerase (Life Technologies). Primer pairs weredesigned to amplify either the 5′ or 3′ end of SIRE1-1 to screen forphage clones carrying full-length SIRE1 elements. The 5′ ends wereamplified using a LTR forward primer (TGGAAGGTTGTAAACAGTGGC) (SEQ ID NO:96) and a gag reverse primer (AGTCGAAAGGGATGTTCCG) (SEQ ID NO: 97); 3′ends were amplified using an env-like ORF forward primer(ACATTGTCTCGACACAGGG) (SEQ ID NO: 98) and a LTR reverse primer(ATATTTTCGGGCAGATG) (SEQ ID NO: 99).

[0234] For sequencing, phage DNAs were isolated from plate lysates(Qiagen). SIRE1-7, 1-8, and 1-9 DNAs were sequenced directly fromrecombinant phage. The DNA sequences of SIRE1-7 (Genbank Accession No.AY205609), SIRE1-8 (Genbank Accession No. AY205610), and SIRE1-9(Genbank Accession No. AY205611) are unique, distinct and separategenomic copies, derived from a Glycine max lambda genomic library, ofthe multi-copy endogenous retrovirus family SIRE1. The DNA sequence ofSIRE1-7 (SEQ ID NO: 87), SIRE1-8 (SEQ ID NO: 90), and SIRE1-9 (SEQ IDNO: 93) each contain two open reading frames, ORF1 and ORF2 (See SEQ IDNO: 88 and 89; SEQ ID NO: 91 and 92; and SEQ ID NO: 94 and 95,respectively) that can be translated into a full complement of intacttheoretical polypeptides characteristic of all functional retroviruses.Sequences corresponding to ORF1 of SIRE1-7, 1-8, and 1-9 (SEQ ID NO: 88,91 and 94, respectively) demonstrated that ORF1 encoded a polyprotein(a.k.a gag-pol) encompassing gag (including Zn finger domains and coatprotein), aspartic acid protease, integrase, and reversetranscriptase-ribonuclease H coding sequences. SIRE1-7, 1-8, and 1-9(SEQ ID NOs: 89, 92 and 95, respectively) ORF2 regions encoded theenvelope protein which is translated as part of a gag-pol-envpolyprotein created by readthrough of the gag-pol stop codon. Thesesequences are greater than 94% identical to each other and to theoriginal SIRE1 described above. All three full-length DNA sequencescontained long terminal repeats flanking the coding regions.

EXAMPLE 14

[0235] Sequence Alignment of SIRE1 Genes SIRE1-1, 1-7,1-8, and 1-9

[0236] In order to determine the similarity between the identified SIRE1sequences, the deduced open reading frames and intervening DNA of eachSIRE1 gene DNA sequences were aligned using CLUSTALW (Higgins, Thompson,and Gibson 1996). The presence of size polymorphisms in the regionbetween the env-like ORF and the 3′ LTR (bases 8200 to 8700) madealignment difficult, and so the region was manually realigned. Gaps wereinserted to maximize alignments of nearly identical blocks of duplicatednucleotides. Phylogenetic and molecular evolutionary analyses wereconducted using MEGA version 2.1 (Kumar et al. 2001). DNA p-distanceswere used for closely related distances (d<0.05) and, where appropriate,gamma distances were calculated using Kimura's 2-parameter method(Kimura 1980). To evaluate the synonymous to non-synonymous substitutionratios (dS/dN), ORF1 was split into two: one encoding just thestructural Gag protein(s), and one encoding PR, IN, and RT (Pol). Thejunction was defined to be 25 codons upstream of the conservedAsp-Ser-Gly, a putative protease active site. This position approximatesthe protease cleavage site for HIV (Pearl and Taylor 1987) as well asfor Ty1 (Merkulov et al. 1996) and Ty3 (Kirchner and Sandmeyer 1993). Toevaluate the dS/dN ratios for the env-like ORF, the amino acidimmediately following the pol termination codon was designated the startcodon. Codon-aligned nucleotide sequences were analyzed using SNAP (Neiand Gojobori 1986). Sequences in Genbank related to SIRE1 and thoseflanking SIRE1 insertions were sought using BLASTn, tBLASTn, and tBLASTx(Altschul et al. 1997).

[0237] Analysis of the new elements sequenced from the genomic libraryindicated that SIRE1-8 comprises a full-length sequence of 9255 bp,while SIRE1-7, and SIRE1-9, are nearly complete copies of 9072 bp and9352 bp, respectively. The sequences were aligned in their entirety byCLUSTALW, and neighbor joining, minimal evolution (ME) and maximumparsimony trees were generated. The length variations among theseelements for the LTR, ORF2, and the ORF2-LTR gap define two clearlydifferentiated groups: one comprised of SIRE1-1 and SIRE1-8 (clade 1)and a second composed of SIRE1-7, and 1-9 (clade 2) (Tables 1 and 2)(FIG. 36). TABLE 1 Summary of SIRE1 Structural Elements and CodingRegions Post Target site Length LTR ORF1 ORF2 ORF2 dupli- Element (bp)(bp) (codons) (codons) (bp) cation SIRE1-1  9295 1001 1578² 658² 527AAATT⁴ SIRE1-7 >9072¹ 1205 1577 683 632 ATTAC⁴ SIRE1-8  9255  999 1577656 496 CACAT SIRE1-9 >9352¹ 1127 1577 681 615 ATTTG⁴

[0238] TABLE 2 Mean lengths of SIRE1 regions grouped by clade (± s.d.)Elements Clade LTR (bp) ORF2 (bp) Gap (bp) 1, 8 1  967 ± 57 1973 ± 5 515± 17 7, 9 2 1175 ± 42 2044 ± 6 628 ± 16

[0239] The high degree of sequence conservation among the sequencedelements was confirmed by analysis of SIRE1 sequences in GenBank. ABLASTn search of the Gene Survey Sequence (GSS) database retrieved 57additional SIRE1 elements from sequenced ends of two soybean BAClibraries (Marek et al. 2001). The BAC-end sequences averaged 500 bp inlength. Ten overlapping gag sequences were 97% identical on average, andthe six sequences with similarity to the env-like gene shared 93%identity. Thus, the overall sequence similarity between the SIRE1elements is approximately 95%. These values are comparable to the degreeof sequence divergence observed for the corresponding regions of thefully sequenced SIRE1 elements (FIG. 36). Forty-eight of the 57sequences (84%) contained reading frames uninterrupted by stop codons orframeshifts over their entire lengths. It has been estimated that thereare approximately 1000 SIRE1 copies, which comprise 0.5 to 1% of soybeangenomic DNA (Laten and Morris 1993). These copy number calculations areconsistent with the present recovery of 57 SIRE1 hits from the 6,146sequences deposited in the GSS database. Hybridizations to arrays ofsoybean BAC clones also support these estimates.

[0240] Another measure of the relative age and diversity of the SIRE1elements is the divergence between the LTRs of the same element. TheLTRs of a single retroelement are theoretically identical at the time ofinsertion because they are reverse transcribed from the same templatesequence. Once integrated, changes in LTR sequences should not besubject to selection, and the frequency should approximate the mutationrate. Alignment data showed that SIRE1-8, which contains two completeLTRs, had two base pair changes while the elements truncated in the 3′LTR, SIRE1-7 and 1-9, had zero and one base-pair differences,respectively.

[0241] Alignment of the LTRs and Putative Cis-Acting Sequences

[0242] The LTRs sequenced ranged in length from 902 bp to 1194 bp (Table1). The length polymorphisms among LTRs are due primarily to tandemsequence duplications. The 5′ ends of the SIRE1-7 and SIRE1-9 LTRs havea common 96-bp duplication separated by five base pairs (FIG. 37). Thedistribution of this duplication replicates that of the lengthpolymorphisms (see Table 2). In addition, the LTRs of SIRE1-7 have fourtandem copies of an imperfect 20 bp repeat beginning at base 726;SIRE1-9 has three copies of the repeat; and SIRE1-8 contains two copies.

[0243] The sequence TATATAA (SEQ ID NO: 100) within the LTR waspredicted with high confidence to sponsor transcriptional initiation atthe adenine at base 630 by both TDNN (Reese 2001) and ProScan(Prestridge 1995)(FIG. 37). This location lies approximately 300 bpupstream of the 5′ end of a previously characterized SIRE1 cDNA clone(Bi and Laten 1996) and demonstrated perfect conservation among allmembers herein. A conserved sequence candidate for a polyadenylationsignal resides upstream of the putative transcriptional start site (base415 in the 5′ LTR). However, a full-length genomic transcript thatutilized this site would not contain a repeated region at both the 5′and 3′ ends, which is necessary to sponsor strand transfer duringreverse transcription. A slightly less favorable candidate for apolyadenylation signal is more appropriately located approximately 200bp downstream of the proposed transcriptional start site (FIG. 37).

[0244] The LTRs contain several repeats of variable length that aresuggestive of regulatory elements (FIG. 37). While none of these repeatscontained motifs resembling cis-acting regulatory elements incharacterized plant retrotransposons (Grandbastien et al. 1997; Takedaet al. 1999), several contained the sequence, AAAG which is the corebinding site for Dof zinc-finger transcription factors (Yanagisawa andSchmidt 1999). Between bases 418 and 508, this tetranucleotide wasdetected five times in SIRE1-1 and SIRE1-8 and eight times in bothSIRE1-7 and 1-9. The same sequence was also present at elevated densityon the complementary strand (FIG. 37). Based on the overall DNAcomposition of the LTR, AAAG and CTTT would be expected to occur 0.6 and0.4 times, respectively, in this region. The cluster of AAAG exhibitedthe greatest density between 95 and 185 bp upstream of the putative TATAbox typical of other retrotransposon regulatory elements (Grandbastienet al. 1997; Takeda et al. 1999).

[0245] The tRNA primer binding site (PBS) in SIRE1 was determined to becomplementary to soybean tRNA imet (Bi and Laten 1996). Among theinsertions sequenced, clade 1 members SIRE1-1 and SIRE1-8 werecomplementary to 10 bases of the 3′ end of the tRNA. Clade 2 elementsSIRE1-7 and 1-9 were complementary to the first 12 bases. Interestingly,the first ten bases of the PBS (TGGTATCAGA) (SEQ ID NO: 101) wererepeated just upstream of the 3′ end of the LTR in every SIRE1 member.The polypurine tract (PPT) lies adjacent to the 3′ LTR and has thesequence AAAGGGGGAGA (SEQ ID NO: 102). No sequence polymorphisms weredetected within the PPT or in the 50 bp upstream of this sequence.

[0246] Alignment of gag-pol Sequences

[0247] A consensus sequence of SIRE1 elements encodes Gag and Pol on asingle open reading frame, which is presumably translated as a singlepolyprotein. Within Gag-Pol are the invariant amino acid residues andconserved motifs found in most Ty1-copia class retrotransposons(Peterson-Burch and Voytas 2002). These include a zinc finger-likeCys-Cys-His-Cys (SEQ ID NO: 103) motif in the presumed nucleocapsidprotein (SIRE1 has two), an Asp-Ser-Gly motif in the catalytic site ofprotease, His-His-Cys-Cys (SEQ ID NO: 104) and Asp-Asp-35-Glu motifs inIN, and several conserved domains within RT.

[0248] Alignment analysis showed strong conservation of the SIRE1gag-pol coding region, ranging from 95-99% identity with an average of98%. SIRE1-1 was shown to contain a single nonsense mutation. Some ofthese nucleotide changes likely compromise SIRE1 function. Despite theseobvious mutations, six short indels insertions or deletions (indels)have occurred that preserve the reading frame. All but one of theseindels are located in the first 1700 bp of ORF1, within the Gag and PRcoding regions. In addition, the proportion of nucleotide changes thatpreserved the amino acid sequence (dS/dN ratio) was calculated. For gag,defined as the coding region from the presumed start codon to 25 aminoacids upstream of the protease active site, the average dS/dN ratioamong elements was 3.90, denoting selective constraint at most sites.Selection for function of pol was considerably stronger, with a dS/dNratio of 7.45.

[0249] The env-Like Gene

[0250] The env-like gene is in the same reading frame as gag-pol and isseparated from gag-pol by a single stop codon. Immediately following thestop codon is a nucleotide sequence motif (CA(A/G)(T/C)RYTA) known tofacilitate stop codon suppression in tobacco mosaic virus (Skuzeski etal. 1991) and several other ssRNA plant viruses (Beier and Grimm 2001).Although there are no examples of Pol-Env fusions in retroelements,constructs carrying the sequence promoted readthrough of the SIRE1 polstop codon in vivo (Havecker and Voytas 2003).

[0251] The length polymorphisms in env are primarily the result ofeleven, in-frame indels, all but one of which were confined to the first550 and last 300 bp of this 2080-bp ORF. Of the 285 polymorphicnucleotide sites, one quarter were located within the first 300 bp ofthe coding region.

[0252] To calculate the dS/dN ratio, the nucleotide sequences werecodon-aligned, and the ratio was found to average 3.29 between theelement pairs. Previously, three motifs were identified in theconceptual translation of this ORF analogous to structural elements inretroviral envelope proteins—a transmembrane domain, a fusion peptide,and a coiled-coil domain (Laten, Majumdar, and Gaucher 1998). Theputative 19-amino acid fusion peptide was perfectly conserved among allsequenced elements, and the presumed 32-residue coiled-coil has only twopolymorphic positions, neither of which alter the heptad repeat pattern.The amino terminal transmembrane domain is polymorphic at 16 of 24residues, yet all variations are predicted to be membrane-spanningpeptides with strong confidence (Table 3).

[0253] The presence of env-like ORFs in SIRE1 and some Ty3/gypsyretroelements has raised speculation that these elements may beretroviruses. The functional role of an envelope protein for viralpropagation in a plant host is unknown, and cell walls preclude membranefusion as a suitable invasive strategy. But the presence of env genes inplant viruses is not unusual. All enveloped plant viruses utilizeinvertebrate vectors in which the glycosylated envelope proteins sponsorhost cell recognition and membrane fusion (VandenHeuvel, Franz, andvanderWilk 2002). ENV has been shown to be dispensable in the planthost. (Goldbach and Peters 1996). When tospoviruses, plant members ofthe Bunyaviridae, are maintained solely by mechanical inoculation ofhost plants, morphological isolates can be recovered with point andframeshift mutations in the glycoprotein gene that lack functionalenvelope proteins (Goldbach and Peters 1996). These isolates are activein the plant host but fail to re-infect the native thrip host (Goldbachand Peters 1996; Nagata et al. 2000).

[0254] The Interval Between the env-Like Gene and the 3′ LTR

[0255] The most variable region in SIRE1 lies immediately downstream ofthe env-like gene and extends to within 100 bp of the PPT adjacent tothe 3′ LTR (FIG. 38). Variation is primarily in the form of a complexpattern of sequence duplications ranging from simple trinucleotiderepeats to imperfect tandem duplications of 100 bp. One shared featureof many of the sequence duplications are the presence of PPT-likesequences.

[0256] Sequence alignment demonstrated that between bases 8176 and 8845,each SIRE1 member contained four to six copies of the sequence AGGGGGAG(SEQ ID NO: 105). Another is the-presence of short duplicationsbordering the indels. The region between the env-like ORF and the 3′LTRvaries in length from 496 to 636 bp. The sequence duplications in thisregion are unusual but not unprecedented among retroelements.

[0257] The best explanation for the gain and loss of these repeats isreplication slippage (Viguera, Canceill, and Ehrlich 2001). Since strandtransfer is a requisite component of retrovirus and retrotransposonreplication, some replication slippage by RT at internal regions isquite plausible. Re-initiation at nearby similar or duplicated sequencesupstream or downstream could be expected, generating the kind ofduplications and subsequent deletions that pervade retroviral genomes(Temin 1993). The presence of tandem triplet repeats and direct repeatsof 4 to 7 bp flanking several of the gaps (FIG. 38) is consistent withthis explanation. In fact, long direct repeats in retroviral DNAs aredeleted at high frequency (Rhode, Emerman, and Temin 1987).

[0258] Flanking Sequences

[0259] The DNA adjacent to the SIRE1 elements was analyzed. SIRE1-8 wasflanked by 5-bp direct repeats comprising the nucleotide sequence CACAT.The 5-bp sequences found adjacent to singular LTRs in the cases of twoother members are shown in Table 1. There does not appear to be arecognizable pattern among these sequences.

[0260] SIRE1-1 is adjacent to the gag-pol region of a member of theTy3-gypsy-like retroelement, diaspora (Genbank Accession No. AF095730.None of the other flanking DNAs herein contained extended ORFs, nor didBLASTn or tBLASTx database searches generate significant hits.

[0261] The flanking DNAs of ten SIRE1 insertions were sequenced and twobelong to identified plant members of the Ty3-gypsy family. Of theremaining eight, one is flanked on either side by members of twodifferent repetitive families, and one is an apparent paralog of asingle BAC-end sequence. The identities of the rest are unknown. Theseresults are suggestive of clustering and/or nesting of some highcopy-number retroelements in G. max, similar to what has been reportedfor other plant genomes (Bennetzen 2000).

[0262] The observed sequence variation among SIRE1 genes indicates theelements may have diverse biological functions. The majority of sequencediversity was detected within the non-coding regions, namely the LTRsand the spacer region between the env-like ORF and the 3′ LTR.Particularly evident were tandem sequence duplications in the 5′ portionof the LTR that result in length polymorphisms ranging from 902 to 1205bp. The shorter duplications detected contained multiple candidatebinding sites for the Dof zinc finger transcription factor just upstreamof the putative promoter. Dof proteins regulate a broad spectrum oftarget genes in both monocots and dicots, including those that areauxin-regulated (Baumann et al. 1999; Kisu et al. 1997),light-responsive (Yanagisawa and Sheen 1998), and stress-induced (Zhanget al. 1995). Stress conditions and defense elicitors are known toinduce Tnt1, Tto1, and Tos17 (Grandbastien et al. 1997; Hirochika et al.1996; Takeda et al. 1998). Repetition of putative, cis-acting sequencemotifs in LTRs have been noted in four actively transcribedelements—BARE1, Tos17, Tnt1, Tto1—(Grandbastien et al. 1997; Hirochikaet al. 1996; Suoniemi, Narvanto, and Schulman 1996; Takeda et al. 1999).In the case of the latter two, the repeated motifs have been shownexperimentally to sponsor inducible element expression (Takeda et al.1999); (Grandbastien et al. 1997) and a MYB-related transcription factorwas shown to interact with and regulate Tto1 at these motifs (Sugimoto,Takeda, and Hirochika 2000). In barley, a MYB transcription factorinteracts with the Dof transcription factor, BPBF, to regulateendosperm-specific genes (Diaz et al. 2002). Interestingly, the SIRE1LTRs contain two potential MYB-binding sites just upstream of theAAAG-dense region (FIG. 37).

[0263] From the foregoing it may be appreciated that, although specificembodiments of the invention have been described herein for purposes ofillustration, various modifications may be made without deviating fromthe spirit and scope of the invention (as set out in the appendedclaims).

[0264] All of the above U.S. patents, U.S. patent applications, foreignpatents, foreign patent applications and non-patent publicationsreferred to in this specification are incorporated herein by reference,in their entirety.

[0265] References Cited

[0266] The following publications which were cited in the specificationare incorporated in their entirety by reference herein.

[0267] Ahlquist, P., R. French, J. J. Bujarski. Molecular studies ofBrome mosaic virus using infectious transcripts from cloned cDNA. Adv.Virus Res. 32:214-242 (1987).

[0268] Ahlquist, P., R. F. Pacha. Gene amplification and expression byRNA viruses and potential for further application to plant genetransfer. Physiol. Plant. 79:163-167 (1990).

[0269] Altenbach, S. B., K. W. Pearson, G. Meeker, L. C. Staraci, and S.S. M. Sun. Enhancement of the methionine content of seed proteins by theexpression of a chimeric gene encoding a methionine-rich protein intransgenic plants. Plant Mol. Biol. 13:513 (1989).

[0270] Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z.Zhang, W. Miller and D. J. Lipman. 1997. Gapped blast and psi-blast—anew generation of protein database search programs. Nucleic Acids Res.25:3389-3402.

[0271] Amberger, L. A., R. G. Palmer and R. C. Shoemaker. Analysis ofculture-induced variation in soybean. Crop Sci. 32:1103-1108 (1992).

[0272] Ashfield, T., N. T. Keen, R. I. Buzzell, R. W. Innes. 1995.Soybean resistance genes specific for different Pseudomonas syringaeavirulence genes are allelic, or closely linked, at the RPGI locus.Genetics 141:1597.

[0273] Baltazar, M B, Mansur, L. 1992. Identification of restrictionfragment length polymorphisms to map soybean cyst nematode resistancegenes in soybean. Soybean Genet. Newslett. 19: 120.

[0274] Baumann, K., A. De Paolis, P. Costantino and G. Gualberti 1999.The DNA binding site of the Dof protein NtBBF1 is essential fortissue-specific and auxin-regulated expression of the ro1B oncogene inplants. Plant Cell 11:323-333.

[0275] Beachy, R. N. 1990. Plant transformation to confer resistanceagainst virus infection, in Gene Manipulation in Plant Improvement, Vol.2, Gustafson, J. P., ed., Plenum Press, New York.

[0276] Beier, H. and M. Grimm. 2001. Misreading of termination codons ineukaryotes by natural nonsense suppressor tRNAs. Nucleic Acids Res.29:4767-4782.

[0277] Bennetzen, J. L. 2000. Transposable element contributions toplant gene and genome evolution. Plant Mol. Biol. 42:251-269.

[0278] Berg, D. E. and M. M. Howe, eds. 1989. Mobile DNA, ASM,Washington, D.C.

[0279] Bernard, R. L., Cremeens, C. R. 1971. A gene for generalresistance to downy mildew of soybeans. J. Hered. 62:359.

[0280] Bi, Y. -A. and H. M. Laten. 1996. Sequence analysis of a cDNAcontaining the gag and prot regions of the soybean retrovirus-likeelement, SIRE-1. Plant Mol. Biol. 30:1315.

[0281] Boeke, J. D. 1989. Transposable elements in Saccharomycescerevisiae. In Mobile DNA, D. E. Berg and M. M. Howe, eds., ASM,Washington, D.C., pp. 335-374.

[0282] Boerma, H R, Harris, B B, Kuhn, C W. 1975. Inheritance ofresistance to cowpea chlorotic mottle virus in soybeans, Crop Sci. 15:849.

[0283] Boutin, S, Ansari, H, Concibido, V, Denny, R, Orf, J, Young, N.1992. RFLP analysis of cyst nematode resistance in soybeans. SoybeanGenet. Newslett. 19: 123.

[0284] Brettell, R. I. S. and E. S. Dennis. 1991. Reactivation of asilent Ac following tissue culture is associated with heritablealterations in its methylation pattern. Mol. Gen. Genet. 229, 365-372.

[0285] Brisson, N., J. Paszkowski, J. R. Penswick, B. Gronenborn, I.Potrykus, T. Hohn. 1984. Expression of a bacterial gene in plants byusing a viral vector. Nature 310, 511-14.

[0286] Britten, R. J., Proc. Natl. Acad. Sci. USA 92, 599 (1995).

[0287] Britten, R. J., T. J. McCormack, T. L. Mears, E. H. Davidson, J.Mol. Evol. 40, 13 (1995).

[0288] Brunke, K. J. and R. L. Meeusen. 1991. Insect control withgenetically engineered crops. Trends Biotechnol. 9, 197.

[0289] Bureau, T. E., S. E. White, S. R. Wessler, Cell 77:479 (1994).

[0290] Burmeister, M. and H. Lehrach. Trends Genet. 12:389 (1996).

[0291] Buss, G. R., Roane, C. W., Tolin, S. A., Vinardi, T. A. 1985. Asecond dominant gene for resistance to peanut mottle virus in soybeans.Crop Sci. 25:314.

[0292] Cal, H. and M. Levine. 1995. Modulation of enhancer-promoterinteractions by insulators in the Drosophila embryo. Nature 376:533-536.

[0293] Casacuberta, J. M., S. Vemhettes and M. -A. Grandbastien. 1995.Sequence variability within the tobacco retrotransposon Tnt1 population.EMBO J. 14, 2670-2678.

[0294] Cavarec, L., S. Jensen and T. Heidmann. 1994. Identification of astrong transcriptional activator for the copia retrotransposonresponsible for its differential expression in Drosophila hydei andmelanogaster cell lines. Biochem. Biophys. Res. Commun. 20-31, 392-399.

[0295] Caverec, L. and T. Heidmann. 1993. The Drosophila copiaretrotransposon contains binding sites for transcriptional regulation byhomeoproteins. Nucl. Acids Res. 21, 5041-5049.

[0296] Chambers, P., C. R. Pringle, A. J. Easton, J. Gen. Virol. 71,3075 (1990).

[0297] Chan, D. C., D. Fass, J. M. Berger, P. S. Kim, Cell 89, 263(1997).

[0298] Chen, P., Buss, G. R., Tolin, S. A. 1993. Resistance to soybeanmosaic virus conferred by two independent dominant genes in PI 486355.J. Hered. 84: 25.

[0299] Choi, S. -Y. and D. V. Faller. 1994. The long terminal repeats ofa murine retrovirus encode a trans-activator for cellular genes. J.Biol. Chem. 269, 19691-19694.

[0300] Dahlberg, J. E., R. C. Sawyer, J. M. Taylor, A. J. Faras, W. E.Levinson, H. M. Goodman, and J. M. Bishop. 1974. Transcription of DNAfrom the 70S RNA of Rous sarcoma virus. 1. Identification of a specific4S RNA which serves as primer. J. Virol. 13:1126-1133.

[0301] Dalgleish, A. G., P. C. L. Beverly, P. R. Clapham, D. H.Crawford, M. F. Greaves, and R. A. Weiss. 1984. The CD4 antigen is anessential component of the receptor for the AIDS retrovirus. Nature 312,763-767.

[0302] Day, A. G., E. R. Bejarano, K. W. Buck, M. Burrell, and C. P.Lichtenstein. 1991. Expression of an antisense viral gene in transgenictobacco confers resistance to the DNA virus tomato golden mosaic virus.Proc. Natl. Acad. Sci. U.S.A. 88, 6721.

[0303] Deleage, G., and B. Roux, Prot. Engng. 1, 289 (1987).

[0304] della-Cioppa, G., S. C. Bauer, M. L. Taylor, D. E. Rochester, B.K. Klein, D. M. Shah, R. T. Fraley, and G. M. Kishore. 1987. Targeting aherbicide resistant enzyme from Escherichia coli to chloroplasts ofhigher plants. Bio/Technology 5, 579.

[0305] Di, R., V. Purcell, G. B. Collins, S. A. Ghabrial. 1996.Production of transgenic soybean lines expressing the bean pod mottlevirus coat protein precursor gene. Plant Cell. Reports 15:746.

[0306] Diaz, I., J. Vicente-Carbajosa, Z. Abraham, M. Martinez, I.Isabel-La Moneda and P. Carbonero. 2002. The GAMYB protein from barleyinteracts with the DOF transcription factor BPBF and activatesendosperm-specific genes during seed development. Plant J. 29:453-464.

[0307] Dickinson, C. D., M. P. Scott, E. H. A. Hussein, P. Argos, and N.C. Nielsen. 1990. Effect of structural modifications on the assembly ofa glycinin subunit. Plant Cell. 2, 403.

[0308] Diers, B. W., Mansur, L., Imsande, J., Shoemaker, R. C. 1992.Mapping phytophthora resistance loci in soybean with resistance fragmentlength polymorphism markers. Crop Sci. 32: 377.

[0309] Eickbush, T. H., in The Evolutionary Biology of Viruses, S. S.Morse, Ed. (Raven Press, New York, 1994) pp. 121-157.

[0310] Engels, W. R. 1989. P elements in Drosophila melanogaster. InMobile DNA, D. E. Berg and M. Howe, eds., ASM, Washington, D.C., pp.437-484.

[0311] Fass, D., S. C. Harrison, P. S. Kim, Nature Struct. Biol. 3, 465(1996).

[0312] Federoff, N. V. 1989. Maize transposable elements. In Mobile DNA,D. E. Berg and M. M. Howe, eds., ASM Washington, D.C., pp. 375-411.

[0313] Felder, H., A. Herzceg, Y. deChastonay, P. Aeby, H. Tobler, F.Muller, Gene 149, 219 (1994)

[0314] Finnegan, D. J. 1989. Eukaryotic transposable elements and genomeevolution. Trends Genet. 5, 103107.

[0315] Flavell, A. J., D. B. Smith and A. Kumar. 1992. Extremeheterogeneity of Tyl-copia group retrotransposons in plants. Mol. Gen.Genet. 231, 233-242.

[0316] Flavell, A. J., V. Jackson, M. P. Iqbal, I. Riach, S. Waddell,Mol. Gen. Genet. 246, 65 (1995).

[0317] Fontenot, J. D., N. Tjandra, C. Ho, P. C. Andrews, R. C.Montelaro, J. Biomol. Struct. Dynam. 11, 821 (1994).

[0318] Freytag, A. H., A. P. Rao-Arelli, S. C. Anand, I. A. Wrather andL. D. Owens. 1989. Somaclonal variation in soybean plants regeneratedfrom tissue culture. Plant Cell Rep. 8, 199-202.

[0319] Friesen, P. D., and M. S. Nissen, Mol. Cell. Biol. 10, 3067(1990).

[0320] Gallaher, W. R., J. M. Ball, R. F. Garry, A. M. Martin-Amedee, R.C. Montelaro, AIDS Res. Hum. Retroviruses 11, 191 (1995).

[0321] Gallaher, W. R., J. M. Ball, R. F. Garry, M. C. Griffin, R. C.Montelaro, AIDS Res. Hum. Retroviruses 5, 431 (1989).

[0322] Georgiev, P. G. and V. G. Corces. 1995. The su(Hw) protein boundto gypsy sequences in one chromosome can repress enhancer-promoterinteractions in the paired gene located on the other homolog. Proc.Natl. Acad. Sci. USA 92. 5184-5 1 S&

[0323] Georjon, C., and G. Deleage, Comput. Applic. Biosci. 11, 681(1995).

[0324] Georjon, C., and G. Deleage, Prot. Engng. 7, 157 (1994).

[0325] Gever, P. K. and V. G. Corces. 1992. DNA position-specificrepression of transcription by a Drosophila zinc finger protein. GenesDev. 6, 1865-1873).

[0326] Gibrat, J. F., J. Gamier, B. Robson, J. Mol. Biol. 198, 425(1987).

[0327] Gijzen, M., T. MacGregor, M. Bhattacharyya, R. Buzzell. 1996.Temperature-induced susceptibility to Phytophthora sojae in soybeanisolines carrying different RPS genes. Physiol. Mol. Plant Path. 48:209.

[0328] Goldbach, R. and Peters, D. 1996. Molecular and BiologicalAspects of Tospoviruses. Pp. 129-157 in R. M. Elliot ed. TheBunyaviridae. Plenum Press, New York.

[0329] Golemboski, D. B., G. P. Lomonossoff, and M. Zaitlin. 1990.Plants transformed with a tobacco mosaic virus nonstructural genesequence are resistant to the virus. Proc. Natl. Acad. Sci. U.S.A. 87,6311.

[0330] Grandbastien, M. -A. 1992. Retroelements in higher plants. TrendsGenet. 8, 103-108.

[0331] Grandbastien, M. A., H. Lucas, J. B. Morel, C. Mhiri, S.Vernhettes and J. M. Casacuberta. 1997. The expression of the tobaccoTnt1 retrotransposon is linked to plant defense responses. Genetica100:241-252

[0332] Grandbastien, M. -A., A. Spielmann and M. Caboche. 1989. Tnt1, amobile retroviral-like transposable element of tobacco isolated by plantcell genetics. Nature 337, 376-380.

[0333] Graybosch, R. A., N. E. Edge and X. Delannay. 1987. Somaclonalvariation in soybean plants regenerated from cotyledonary node tissueculture system. Crop Sci. 27, 803-806.

[0334] Gresshoff, P. M. and D. Landau-Ellis. 1994. Molecular mapping ofsoybean nodulation genes. In Plant Genome Analysis, P. Gresshoff, ed.,CRC Press, Boca Raton, pp. 97-112.

[0335] Groose, R. W. and R. G. Palmer. 1987. New mutations in agenetically unstable line of soybeans. Soybean Genet. Newsl. 14,164-1610.

[0336] Groose, R -W., H. D. Weigelt and R -G. Palmer. 1988. Somaticanalysis of unstable mutation for anthocyanin pigmentation insoybean. 1. Heredity 79, 263-267.

[0337] H. B. Urnovitz and W. H. Murphy, Clin. Microbiol. Rev. 9, 72(1996).

[0338] Hagen, G., and T. Guilfoyle. 1985. Rapid induction of selectivetranscription by auxins. Mol. Cell Biol. 5, 1197.

[0339] Harlow, E., and D. Lane. 1985. Antibodies: A Laboratory Manual.Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.

[0340] Hartwig, E. E., Bromfield, K. R., 1983. Relationships among threegenes conferring specific resistance to rust in soybeans. Crop Sci. 23:237.

[0341] Haughn, G. W., et al. 1988. Mol. Gen. Genet. 211, 266.

[0342] Havecker, E. A. and Voytas, D. F. 2003. The soybean retroelementSIRE1 uses stop codon suppression to express its envelope-like protein.EMBO Rep., in press.

[0343] Hemenway, C., R. -X. Fang, W. K. Kaniewski, N. -H. Chua, and N.E. Turner. 1988. Analysis of the mechanism of insect resistanceengineered into tobacco. Nature 330, 160.

[0344] Higgins, D. G., J. D. Thompson and T. J. Gibson. 1996. UsingCLUSTAL for multiple sequence alignments. Meth. Enzymol. 266:383-402.

[0345] Hill, K. K., N. Jarvis-Eagan, E. L. Halk, K. J. Krahn, L. W.Liao, R. S. Mathewson, D. J. Merlo, S. E. Nelson, K. E. Rashka, and L.S. Loesch-Fries. 1991. The development of virus-resistant alfalfa,Medicago sativa L. Bio/Technology 9, 373.

[0346] Hirochika, H. 1993. Activation of tobacco retrotransposons duringtissue culture. EMBO J. 12, 2521-2528.

[0347] Hirochika, H., K. Sugimoto, Y. Otsuki, H. Tsugawa and M. Kanda.1996. Retrotransposons of rice involved in mutations induced by tissueculture. Proc. Natl. Acad. Sci. USA 93:7783-7788.

[0348] Hoffman, L. M., D. D. Donaldson, and E. M. Herman. 1988. Amodified storage protein is synthesized, processed, and degraded in theseed of transgenic plants. Plant Mol. Biol. 11, 717.

[0349] Hofmann, K., and W. Stoffel, Biol. Chem. Hoppe-Seyler 347, 166(1993).

[0350] Horsch, R. B., et al. 1984. Science 223, 496.

[0351] Hsu, H. T., and R. H. Lawson. 1991. Direct tissue blotting fordetection of tomato spotted wilt virus in impatiens. Plant Dis. 75, 292.

[0352] Hu, W., O. P. Das and J. Messing. 1995. Zeon-1, a member of a newmaize retrotransposon family. Mol.Gen. Genet. 248, 471-480.

[0353] Hunter, E., and R. Swanstrom, Curr. Top. Microbiol. Immunol. 157,187 (1990)

[0354] Hutchinson III, C. A., S. C. Hardies, D. D. Loeb, W. R. Shehee &M. H. Edgell. 1989. LINES and related retroposons: long interspersedrepeated sequences in the eucaryotic genome. In Mobile DNA, D. E. Bergand M. M. Howe, eds., ASM, Washington, D.C., pp.593-617.

[0355] Inouye, S., S. Yuki, K. Saigo, Eur. J. Biochem. 154, 417 (1986).

[0356] Johns, M. A., J. Mottinger and M. Freeling. 1985. A low copynumber, copia-like transposon in maize. EMBO J. 4, 1093-1102.

[0357] Kaeppler, S. M. and R. L. Phillips. 1993. Tissue culture-inducedDNA methylation variation in maize. Proc. Natl. Acad. Sci. USA 90,8773-8776.

[0358] Kasuga, T, Gijzen, N C, Buzzelli, R, Bhattacharyya, M. 1996.Isolation and mapping of amplified fragment length polymorphisms (AFLP)DNA markers that are linked to the RPS I locus of soybean. (Abstract)Plant Genome IV, San Diego, 1996.

[0359] Katz, R. A. and J. E. Jentoft. 1989. What is the role of theCys-His motif in retroviral nucleocapsid (NC) proteins? Bioessays II,176-18 1.

[0360] Keen, N T, Buzzell, R I. 199 1. New disease resistance genes insoybean against Pseudomonas syringae pv glycinea: evidence that one ofthem interacts with a bacterial elicitor. Theor. Appl. Genet. 81: 133.

[0361] Keim, P, Schupp, J M, Ferreira, A, Zhu, T, Shi, L, Travis, S E,Clayton, K, Webb, D M. 1996. A high density soybean genetic map usingRFLP, RAPD, and AFLP genetic markers. (Abstract) Plant Genome IV, SanDiego, 1996.

[0362] Kilen, T C, Hartwig, E E. Identification of single genescontrolling resistance to stem canker in soybean. Crop Sci. 27: 863.

[0363] Kim, A., C. Terzian, P. Santamaria, A. Pelisson, N. Prudhomme, A.Bucheton, Proc. Natl. Acad. Sci. USA 91, 1285 (1994).

[0364] Kimura, M. 1980. A simple method for estimating evolutionaryrates of base substitutions through comparative studies of nucleotidesequences. J. Mol. Evol. 16:111-120.

[0365] Kina, C. C. 1992. Modular transposition and the dynamic structureof eukaryotic regulatory evolution. Genetica 86, 127-142.

[0366] Kirchner, J. and S. Sandmeyer. 1993. Proteolytic processing ofTy3 proteins is required for transposition. J. Virology 67:19-28.

[0367] Kisu, Y., Y. Harada, M. Goto and M. Esaka. 1997. Cloning of thepumpkin ascorbate oxidase gene and analysis of a cis-acting regioninvolved in induction by auxin. Plant Cell Physiol. 38:631-637.

[0368] Klimyuk, V. I., B. J. Carroll, C. M. Thomas and J. D. Jones.1993. Alkali treatment for rapid preparation of plant material forreliable PCR analysis. Plant J. 3:493-494.

[0369] Kumar, S., K. Tamura, I. B. Jakobsen and M. NEI. 2001. MEGA2:molecular evolutionary genetics analysis software. Bioinformatics17:1244-1245.

[0370] Laten, H. M. and R. O. Morris. 1993. SIRE-1, a long interspersedrepetitive DNA element from soybean with weak sequence similarity toretrotransposons: initial characterization and partial sequence. Gene134, 153-159.

[0371] Laten, H. M., A. Majumdar and E. A. Gaucher. 1998. SIRE-1, acopia/Ty1-like retroelement from soybean, encodes a retroviralenvelope-like protein. Proc. Natl. Acad. Sci. USA 95:6897-6902.

[0372] Lee, S -H, Tamulonis, J, Bailey, M, Man, R, Ashley, D, Parrott,W, Boerma, R, Carter, Jr, T, Shipe, E, Hussey, R. 1996. Molecularmarkers associated with soybean seed protein and oil across populationsand locations. (Abstract) Plant Genome IV, San Diego, 1996.

[0373] Lee, W. S., J. T. C. Tzen, J. C. Kridl, S. E. Radke, and A. H. C.Huang. 1991. Maize oleosin is correctly targeted to seed oil bodies inBrassica napus transformed with the maize oleosin gene. Proc. Natl.Acad. Sci. U.S.A. 88, 6181.

[0374] Levin, J. M., B. Robson, J. Gamier, FEBS Lett. 205, 303 (1986).

[0375] Lim, J. K. and M. J. Simmons. 1994. Gross chromosomalrearrangements mediated by transposable elements in Drosophilamelanogaster. Bioessays 16, 269-275.

[0376] Lohnes, D G, Bernard, R I. 1992. Inheritance of resistance topowdery mildew in soybeans. Plant Disease 76: 964.

[0377] Lohning, C. and M. Ciriacy. 1994. The TYE7 gene of Saccharomycescerevisiae encodes a putative bHLH-LZ transcription factor required forTy1-mediated gene expression. Yeast 10, 1329-1339.

[0378] Lupas, A., M. Van Dyke, J. Stock, Science 252, 1162 (1991).

[0379] Luzzi, B M, Boerma, H R, Hussey, R S. 1994. A gene for resistanceto the soybean root-knot nematode in soybean. J. Hered. 85: 484.

[0380] Luzzi, B M, Boerma, H R, Hussey, R S. 1994. Inheritance ofresistance to the soybean root-knot nematode in soybean. Crop Sci. 34:1240.

[0381] Ma, G., P. Chen, G. R. Buss, S. A. Tolin. 1995. Geneticcharacteristics of two genes for resistance to soybean mosaic virus inP1486355 soybean. Theor. Appl. Genetics 91:907.

[0382] Mansky, L. M., D. P. Durand and J. H. Ell. 1991. Effects oftemperature on the maintenance of resistance to soybean mosaic virus insoybean. Phytopathol. 8 1, 53 5-53)8.

[0383] Marek, L. F., J. Mudge, L. Darnielle, D. Grant, N. Hanson, M.Paz, H. H. Yan, R. Denny, K. Larson, D. Foster-Hartnett, A. Cooper, D.Danesh, D. Larsen, T. Schmidt, R. Staggs, J. A. Crow, E. Retzel, N. D.Young and R. C. Shoemaker. 2001. Soybean genomic survey: BAC-endsequences near RFLP and SSR markers. Genome 44:572-581.

[0384] Matthews, R. E. F., Plant Virology (Academic Press, New York,1991).

[0385] McClintock, B. 1984. The significance of responses of the genometo challenge. Science 226, 792-801.

[0386] McDonald, J. F. 1990. Macroevolution and retroviral elements.BioScience 40, 183-191.

[0387] McDonald, J. F. 1990. Evolution and consequences of transposableelements. Curr. Opin. Genet. Devel. 3, 855-864.

[0388] McDonald, J. F., D. J. Strand, M. R. Brown, S. M. Paskewitz, A.K. Csink and S. H. Voss. 1988. Evidence of host-mediated regulation ofretroviral element expression at the posttranscriptional level. InEukaryotic Transposable Elements as Mutagenic Agents, M. E. Lambert, J.F. McDonald and I. B. Weinstein, eds., Cold Spring Harbor Laboratory,New York, pp. 219-234.

[0389] McEntee, K. and V. A. Bradshaw. 1988. Effects of DNA damage ontranscription and transposition of Ty retrotransposons of yeast. InEukaryotic Transposable Elements as Mutagenic Agents, M. E. Lambert, J.F. McDonald and I. B. Weinstein, eds., Cold Spring Harbor Laboratory,New York, pp. 245-253.

[0390] Mellentin-Michelotti, J., S. John, W. D. Pennie, T. Williams andG. L. Hager. 1994. The 5′ enhancer of the mouse mammary tumor virus longterminal repeat contains a functional AP-2 element. J. Biol. Chem. 269,31983-31990.

[0391] Merkulov, G. V., K. M. Swiderek, C. B. Brachmann and J. D. Boeke.1996. A critical proteolytic cleavage site near the C terminus of theyeast retrotransposon Ty1 Gag protein. J. Virology 70:5548-5556.

[0392] Moreira, M A, Barros, E G, Sediyama, C S, Sediyama, T. 1996.Breeding soybean for high quality seeds assisted by molecular markers.(Abstract) Plant Genome IV, San Diego, 1996.

[0393] Murphy, J. E., and S. P. Goff. 1988. Construction and analysis ofdeletion mutations in the U5 region of Moloney murine leukemia virus:effects on RNA packaging and reverse transcription. J. Virol. 63,319-327.

[0394] Mushegian, A. R. and E. V. Koonin, Arch Virol. 133, 239 (1993).

[0395] Nathan, M., L. M. Mertz and D. K. Fox. 1995. Optimizing longRT-PCR. Focus 17, 78-80.

[0396] Navot, N., R. Ber, and H. Czosnek. 1989. Rapid detection oftomato yellow leaf curl virus in squashes of plant and insect vectors.Phytopathology 79, 562.

[0397] Nei, M. and T. Gojobori. 1986. Simple methods for estimating thenumbers of synonymous and nonsynonymous nucleotide substitutions. Mol.Biol. Evol. 3:418-426.

[0398] Nelson, R. S., S. M. McCormick, X. Delannay, P. Dube, J. Layton,E. J. Anderson, M. Kaniewska, R. K. Proksch, R. B. Horsch, S. G. Rogers,R. T. Fraley, and R. N. Beachy. 1993. Virus tolerance, plant growth, andfield performance of transgenic tomato plants expressing coat proteinfrom tobacco mosaic virus. Bio/Technology 6, 403.

[0399] Ngeleka, K, Smith O D. 1993. Inheritance of stem cankerresistance in soybean cultivars Crockett and Dowling. Crop Sci. 33: 67.

[0400] Padgette, S. R., N. B. Taylor, D. L. Nida, M. R. Bailey, J.MacDonald, L. R. Holden, R. L. Fuchs. 1996. The composition ofglyphosphate-tolerant soybean seeds is equivalent to that ofconventional soybeans. J. Nutr. 126:702.

[0401] Palmgren, M. G. 1994. Capturing of host DNA by a plantretroelement: Bs I encodes plasma membrane H+-ATPase domains. Plant Mol.Blol. 25, 137-140.

[0402] Paquin, E. and V. M. Williamson. 1988. Effect of temperature onTy transposition. In Eukaryotic Transposable Elements as MutagenicAgents, M. E. Lambert, I. F. McDonald and I. B. Weinstein, eds., ColdSpring Harbor Laboratory, New York, pp. 235-244.

[0403] Patience, C., D. A. Wilkenson, R. A. Weiss, Trends Genet. 13, 116(1997).

[0404] Pearl, L. H. and W. R. Taylor. 1987. A structural model for theretroviral proteases. Nature 329, 351354.

[0405] Perlak, F. J., R. L. Fuchs, D. A. Dean, S. L. McPherson, and D.A. Fischoff. 1991. Modification of the coding sequence enhances plantexpression of insect control protein genes. Proc. Natl. Acad. Sci.U.S.A. 88, 3324.

[0406] Peschke, V. M. and R. L. Phillips. 1991. Activation of the maizetransposable element Suppressor-mutator (Spm) in tissue culture. Theor.Appl. Genet. 81, 90-97.

[0407] Peschke, V. M., R. L. Phillips and B. G. Gengenbach. 1991.Genetic and molecular analysis of tissue culture-derived Ac elements.Theor. Appl. Genet. 821, 121-129.

[0408] Peterson-Burch, B. D. and D. F. Voytas. 2002. Genes of thePseudoviridae (Ty1/copia Retrotransposons). Mol. Biol. Evol.19:1832-1845.

[0409] Phillips, D, Boerma, B R. 1982. Two genes for resistance to race5 of Cercospora sojina in soybeans. Phytopathol. 72: 764.

[0410] Pinter, A., and W. J. Honnen, J. Virology 62, 1016 (1988).

[0411] Pouteau, S., M. -A. Grandbastien and M. Boccara. 1994. Microbialelicitors of plant defense responses activate transcription of aretrotransposon. Plant J. 5, 535-542.

[0412] Prabhu, R, Doubler, T W, Chang, S I C, Lightfoot, D A. 1996.Development of sequence characterized amplified regions (SCARs) formarker-assisted selection of soybean lines resistant to sudden deathsyndrome. (Abstract) Plant Genome IV, San Diego, 1996.

[0413] Prestridge, D. S. 1995. Predicting pol II promotor sequencesusing transcription factor binding sites. J. Mol. Biol. 249:923-932.

[0414] Qian, D., F. L. Allen, G. Stacey, P. M. Gresshoff. 1996. Plantgenetic study of restricted nodulation in soybean. Crop Sci. 36(2):243-49.

[0415] Rao-Arelli, A P, Anand, S C, Wrather, A. 1992, Soybean resistanceto soybean cyst nematode race 3 is conditioned by an additional dominantgene. Crop Sci. 32: 862.

[0416] Reese, M. G. 2001. Application of a time-delay neural network topromoter annotation in the Drosophila melanogaster genome. Comp. Chem.26:51-56.

[0417] Rezaian, M. A., K. G. M. Skene, and J. G. Ellis. 1988. AntisenseRNAs of cucumber mosaic virus in transgenic plants assessed for controlof the virus. Plant Mol. Biol. 11, 463.

[0418] Rhode, B. W., M. Emerman and H. M. Temin. 1987. Instability oflarge direct repeats in retrovirus vectors. J. Virology 61:925-927.

[0419] Rio, D. C. 1990. Molecular mechanisms regulating Drosophila Pelement transposition. Annu. Rev. Genet. 24, 543-578.

[0420] Robertson, H. D., S. H. Howell, M. Zaitlin, and R. L. Malmberg,eds. 1983. “Plant infectious agents” in Viruses, Viroids, Virusoids, andSatellites. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y.

[0421] Robins, D. M. and L. C. Samuelson. 1993. Retrotransposons and theevolution of mammalian gene expression. In Transposable Elements andEvolution, J. F. McDonald, ed., Kluwer, Dordrecht, pp. 515.

[0422] Roth, E. J., B. L. Frazier, N. R. Apuya and K. G. Lark. 1989.Genetic variation in an inbred plant: variation in tissue cultures ofsoybean (Glycine max (L.) Merrill). Genetics 12: 359-368.

[0423] Saigo, K., W. Kugiyama, Y. Matsuo, S. Inouye, K. Yoshioka, S.Yuki, Nature 312, 659 (1984).

[0424] Sambrook, J., E. F. Fritsch and T. Maniatis. 1989. MolecularCloning. Cold Spring Harbor Laboratory: New York.

[0425] Sandmeyer, S. B., L. J. Hansen and D. L. Chalker. 1990.Integration-specificity of retrotransposons and retroviruses. Annu. Rev.Genet. 24, 491-518.

[0426] Sanger, F., S. Nicklen and A. R. Coulson. 1977. DNA sequencingwith chain terminating inhibitors. Proc. Nat. Acad. Sci. USA 74,5463-5467.

[0427] SanMiguel, P., A. Tikhonov, Y. -K. Jin, N, Motchoulskaia, D.Zakharov, A. Melake-Berhan, P. S. Springer, K. J. Edwards, M. Lee, Z.Avramova, J. L. Bennetzen, Science 274, 765 (1996).

[0428] Schwarz-Sommer, Z. and H. Saedler. 1987. Can plant transposableelements generate novel regulatory systems? Mol. Gen. Genet. 209,207-209.

[0429] Schwarz-Sommer. Z. and H. Saedler. 1988. Transposition andretrotransposition in plants. In Plant Transposable Elements, 0. Nelson,ed. Plenum Press: New York, pp. 175-187.

[0430] Shah, D. M. et al. 1986. Science 233, 478.

[0431] Shapiro, J. A. 1983. Mobile Genetic Elements. New York: AcademicPress.

[0432] Shapiro, J. A. 1992. Natural genetic engineering in evolution.Genetica 86, 99-111.

[0433] Sheridan, M. A. and R. G. Palmer. 1977. The effect of temperatureon an unstable gene in soybeans. J. Hered. 68, 17-22.

[0434] Shih, C. C., J. P. Stoye, and J. M. Coffin. 1988. Highlypreferred targets for retrovirus integration. Cell 53, 531-537.

[0435] Shoemaker, R, S. Zhao, V. Kanazin, L. Marek. 1996. Phytophthoraroot rot resistance gene mapping in soybean. (Abstract) Plant Genome IV,San Diego, 1996.

[0436] Shoemaker, R. C., L. A. Amberger, R. G. Palmer, L. Oglesby and J.P. Ranch. 1991. Effect of 2,4 dichlorophenoxyacetic acid concentrationon somatic embryogenesis and heritable variation in soybean [Glycine max(L) Merr.]. In Vitro Cell. Dev. Biol. 27P, 84-88.

[0437] Skuzeski, J. M., L. M. Nichols, R. F. Gesteland and J. F. Atkins.1991. The signal for a leaky UAG stop codon in several plant virusesincludes the two downstream codons. J. Mol. Biol. 218:365-373.

[0438] Southern, E. M. 1975. Detection of specific sequences among DNAfragments separated by gel electrophoresis. J. Mol. Biol. 98, 503.

[0439] Sugimoto, K., S. Takeda and H. Hirochika. 2000. MYB-relatedtranscription factor NtMYB2 induced by wounding and elicitors is aregulator of the tobacco retrotransposon Tto1 and defense-related genes.Plant Cell 12:2511-2527,

[0440] Suoniemi, A., A. Narvanto and A. H. Schulman. 1996. The BARE-1retrotransposon is transcribed in barley from an LTR promoter active intransient assays. Plant Mol. Biol. 31:295-306.

[0441] Switzer, W. M. and W. Heneine. 1995. Rapid screening of openreading frames by protein synthesis with an in vitro transcription andtranslation system. Biotech. 18, 244-1-48.

[0442] Takahashi, R., and S. Asanuma. 1996. Association of T gene withchilling tolerance in soybean. Crop Sci. 36:559.

[0443] Takeda, S., K. Sugimoto, H. Otsuki and H. Hirochika. 1999. A13-bp cis-regulatory element in the LTR promoter of the tobaccoretrotransposon Tto1 is involved in responsiveness to tissue culture,wounding, methyl jasmonate and fungal elicitors. Plant J. 18:383-393.

[0444] Tanda, S., J. L. Mullor, V. G. Corces, Mol. Cell. Biol. 14, 5392(1994).

[0445] Temin, H. M. 1993. Retrovirus variation and reversetranscription: abnormal strand transfers result in retrovirus geneticvariation. Proc. of the Natl. Acad. Sci. USA 90:6900-6903.

[0446] Titus, D. E. 1991. Promega Protocols and Applications Guide.Madison, Wis.

[0447] Vaeck, M., A. Reynaerts, H. Hofte, S. Jansens, M. DeBeuckeleer,C. Dean, M. Zabeau, M. Van Montagu, and J. Leemans. 1987. Transgenicplants protected from insect attack. Nature 328, 33.

[0448] Vandenheuvel, J. F. J. M., Franz, A. W. E. and Vanderwilk, F.2002. Molecular Basis of Virus Transmission. Pp. 183-210 in C. L.Mandahar ed. Molecular Biology of Plant Viruses. Kluwer, Boston.

[0449] Varmus, H. and P. Brown. 1989. Retroviruses. In Mobile DNA, D. E.Berg and M. M. Howe, eds. pp.53-108.

[0450] Varmus, H., and P. Brown, in Mobile DNA, D. E. Berg and M. M.Howe, Eds. (ASM, Washington, D.C., 1989) pp 53-108.

[0451] Varmus, H. E. 1982. Form and function of retroviral proviruses.Science 216, 812-821.

[0452] Viguera, E., D. Canceill and S. D. Ehrlich. 2001. Replicationslippage involves DNA polymerase pausing and dissociation. EMBO J.20:2587-2595.

[0453] Voytas, D. F., M. P. Cummings, A. Konieczny, F. M. Ausubel and S.R. Rodermel. 1992. copia-like retrotransposons are ubiquitous amongplants. Proc. Natl. Acad. Sci. USA 89, 7124-7128.

[0454] Watson, J. D., N. H. Hopkins, J. W. Roberts, J. A. Steitz, and A.M. Weiner. 1987. Molecular Biology of the Gene. Menlo Park:Benjamin/Cummings Publishing.

[0455] Waugh, R. and J. W. S. Brown. 1991. Plant gene structure andexpression. In Plant Genetic Engineering, D. Gierson, ed., Chapman andHall, New York, pp. 1-37.

[0456] Weil, C. F. and S. R. Wessler. The effects of plant transposableelement insertions on transcription initiation and RNA processing. 1990.Annu. Rev. Plant Physiol. Plant Mol. Biol. 41, 527-552.

[0457] White, S. E., L. F. Habera and S. R. Wessler. 1994.Retrotransposons in the flanking regions of normal plant genes: A rolefor copia-like elements in the evolution of gene structure andexpression. Proc. Nad. Acad. Sci. USA 91, 11792-11796.

[0458] Williamson, M. P., Biochem. J. 297, 249 (1994).

[0459] Wilson, I. B. H., Y. Gavel, G. von Heijne, Biochem. J. 275, 529(1991).

[0460] Wu, S. C., Q. Lu, A. L. Kriz, J. E. Harper. 1995. Identificationof cDNA clones corresponding to two inducible nitrate reductase genes insoybean—analysis in wild-type and NR(1) mutant. Plant Mol. Biol.29:491-506.

[0461] Yanagisawa, S. and J. Sheen. 1998. Involvement of maize dof zincfinger proteins in tissue-specific and light-regulated gene expression.Plant Cell 10:75-89.

[0462] Yanagisawa, S. and R. J. Schmidt. 1999. Diversity and similarityamong recognition sequences of Dof transcription factors. Plant J.17:209-214.

[0463] Young, N D. 1996. Genome analysis of soybean cyst nematoderesistance in soybean. (Abstract) Plant Genome IV, San Diego, 1996.

[0464] Yu, Y. G., M. A. S. Maroof, G. R. Buss. 1996. Divergence andallelomorphic relationship of a soybean virus resistance gene based ontightly linked DNA microsatellite and RFLP markers. Theor. Appl.Genetics 92:64.

[0465]

1 105 1 22 DNA Artificial sequence Synthetic primer 1 tnttngatcgkgtncartgc tg 22 2 776 DNA Glycine max misc_feature SIRE 1 fragment fromGlycine max genomic DNA 2 tattggatcg ggtgcagtgc tgtttttggc aggaacaaattatgtcatgg ttgttctgcc 60 agcagattta tgattaaatc caagtcctct ctggtttccaacattcttcc caagctgtag 120 cacctcatca agcaaatttg agcctttatt cagcatctttattgattttg tcatgttttc 180 cagtttagag ttcagaaaac caatttctcc tttaagttcagagatttcct cttcatgtgc 240 ctccttctca gcctccagat ttgcaatgac cttctttagttgtgcttctt gctgaagaat 300 cttctcactt ttgatgcata gttctctata ggatatagcaagctcatcaa aagtgatttc 360 actatctgta tcacttgaat cttcagcaga ttcaaatctcccagtgagtg cattcacatc 420 tctgtcagaa tcacttcttg ttcactctct gtatcatcagaccgacatac agaaagtcct 480 ttcctctgct tcttgagatg agtgggacat tcagctttgatgtgtccata gccttcacac 540 ccatggcatt gaattccttt gctgtgactg ggcttttcatctgacctttt ctggtattca 600 ctacctttcc tgatgtcgaa agggatgttc cggacatgtggtttctgcct cctgtccatt 660 ctgttcagca ctttgttgaa ctgttttcca aggagcacaactgcgttagt cagaccttca 720 tcagtatcca ggtcatactc atcttcttct ccttcagcactgcacccgat ccaata 776 3 2417 DNA Glycine max misc_feature SIRE 1 cDNAclone 3 tccggtccct ggcttggtag cccccagatg taggtgaggt tgcaccgaactgggttaaca 60 attctcttgt gttagttact tgtttaatct gttcatacag tcaaacataatctgcatgtt 120 ctgaagcgtg atgtcgtgac atccggtacg acatctgtca ttggtatcagaatttcaatt 180 ggtatcagag caggcactcg aattcactga gtgagatcta gggagataaattctgatgaa 240 catggagaaa gaaggaggac cagtgaacag accaccaatt ctggatggaaccaactatga 300 atactggaaa gcaaggatgg tggccttcct caaatcactg gatagcagaacctggaaagc 360 tgtcatcaaa gactgggaac atcccaagat gctggacaca gaaggaaagcccactgatgg 420 attgaagcca gaagaagact ggactaaaga agaagacgaa ttggcacttggaaactccaa 480 agctttgaat gctctattca atggagttga caagaatatc ttcagactgatcaacacatg 540 cacagtggcc aaggatgcat gggagatcct gaaaaccact catgaaggaacctccaaagt 600 gaagatgtcc agattgcaac tattggccac aaaattcgaa aatctgaagatgaaggagga 660 agagtgtatt catgactttc acatgaacat tcttgaaatt gccaatgcttgcactgcctt 720 gggagaaaga atgactgatg aaaagctggt gagaaagatc ctcagatccttgcctaagag 780 atttgacatg aaagtcactg caatagagga ggcccaagac atttgcaacctgagagtaga 840 tgaactcatt ggttcccttc aaacctttga gctaggactc tcggataggactgaaaagaa 900 gagcaagaat ctggcgttcg tgtccaatga tgaaggagaa gaagatgagtatgacctgga 960 tacagatgaa ggtctgacta atgcagttgt gctccttgga aaacagttcaacaaagtgct 1020 gaacagaatg gacaggaggc agaaaccaca tgtccggaac atccctttcgacatcaggaa 1080 aggtagtgaa taccagaaaa ggtcagatga aaagcccagt cacagcaaaggatttcaatg 1140 ccatgggtgt gaaggctatg gacacatcaa agctgaatgt cccactcatctcaagaagca 1200 gaggaaagga ctttctgtat gtcggtctga tgatacagag agtgaacaagaaagtgattc 1260 tgacagagat gtgaatgcac tcactgggag atttgaatct gctgaagattcaagtgatac 1320 agacagtgaa atcacttttg atgagcttgc tacatcctat agagaactatgcatcaaaag 1380 tgagaagatt cttcagcaag aagcacaact gaagaaggtc attgcaaatctggaggctga 1440 gaaggaggca catgaagagg agatctctga gcttaaagga gaagttggttttctgaactc 1500 taaactggaa aacatgacaa aatcaataaa gatgctgaat aaaggctcagatatgcttga 1560 tgaggtgcta cagcttggga agaatgttgg aaaccagaga ggacttgggtttaatcataa 1620 atctgctggc agaataacca tgacagaatt tgttcctgcc aaaatcagcactggagccac 1680 gatgtcacaa catcggtctc gacatcatgg aacgcagcag aaaaagagtaaaagaaagaa 1740 gtggaggtgt cactactgtg gcaagtatgg tcacataaag cccttttgctatcatctaca 1800 tggccatcca catcatggaa ctcaaagtag cagcagcaga aggaagatgatgtgggttcc 1860 aaaacacaag attgtcagtc ttgttgttca tacttcactt agagcatcagctaaggaaga 1920 ttggtaccta gatagcggct gttccagaca catgacagga gtcaaagaatttctggtgaa 1980 cattgaaccc tgctccacta gctatgtgac atttggagat ggctctaaaggaaagatcac 2040 tggaatggga aagctagtcc atgatggact tcgttatgtc aaggaataagatcgggctgc 2100 acaatgcaca aggcaagata aaatgtcaaa tgaagaattg aagctgcaggatccatgatg 2160 tcggatacaa tgtccaggac atcctgcccg aaaatactgg agttgctgcacaatgcacaa 2220 ggcaagataa aagaagtgaa gctgcaggat ccacgatgtc ggatacgatgtccaggacat 2280 ctggcccgaa aatactggac acataaatct gttatatctt taacagattattgtgcagtt 2340 agcaacaggt tagacgatct atctttagga acgaactctt ctagttccggaattcgagct 2400 cggtacccgg ggatcct 2417 4 14 PRT Glycine max 4 Cys HisGly Cys Glu Gly Tyr Gly His Ile Lys Ala Glu Cys 1 5 10 5 10 PRT Glycinemax 5 Leu Asp Ser Gly Cys Ser Arg His Met Thr 1 5 10 6 22 DNA Glycinemax 6 tggtatcaga gcaggcactc ga 22 7 17 DNA Glycine max misc_feature 3′end of SIRE 1 element (sequence is identified in the 5′ to 3′ directionin the sequence listing and in the 3′ to 5′ direction in Figure 12 7actttaagac tatggtt 17 8 4224 DNA Glycine max misc_feature SIRE 1 genomicDNA clone 8 gctcgcggcc gcgagctcta atacgactca ctatagggcg tcgactcgatcttgttgatg 60 ataaagttat cacactggag catgttgaca ctgaggaaca aatagcagatattttcacaa 120 aggcattgga tgcaaatcag tttgaaaaac tgaggggcaa gctgggcatttgtctgctag 180 aggatttata gcaattactt ttatctgaac gtgcttaaac gttaatagcgcgttctctac 240 tgggccaaaa caaattcgac cgttgcttca cacgtccctc tacattcctcattcaaactc 300 atattttcgt ggtaatctcg ttttcagcat tccccaacag ctctcagagatttacgaaac 360 cattccaaag gctctgcttc tccatggcta cctcaccaaa agatacttcatctcctggtt 420 caccctctgt accatcatct ccatcatcca ccaaagcacc atcaaaccaggaacaacctg 480 aattccatat ccaacccata caaatgattc ctggtctagc ccctgttcctgagaaactgg 540 tccccataag acaacaggga gtgaagattt ctgaaaaccc tagcattgcaacaagtccta 600 gggaattgac acgggagatg gataagaaga tccgcagtat tgtgagtagtattctgaaaa 660 atgcttctgt ccctgatgct gataaagatg ttccaacatc ttccaccccaaatgctgaag 720 tcctctcttc atccagtaaa gaggaatcaa cagaggaaga ggaacaagccacagaggaga 780 cccctgcacc aagggcacca gaacctgctc caggtgacct cattgacctagaagaagtag 840 aatctgatga ggaacccatt gccaacaagt tggcacctgg cattgcagaaagattacaaa 900 gcagaaaggg aaaaaccccc attactaggt ctggacgaat caaaactatggcacagaaga 960 agagcacacc aatcactcct accacatcca gatggagcaa agttgcaatcccttccaaga 1020 agaggaaaga attttcctca tctgattctg atgatgatgt cgaactagatgttcccgaca 1080 tcaagagggc caagaaatct gggaaaaagg tgcctggaaa tgtccctgatgcaccattgg 1140 acaacatttc attccactcc attggcaatg ttgaaaggtg gaaatttgtatatcaacgca 1200 gacttgcctt agaaagagaa ctgggaagag atgccttgga ttgcaaggagatcatggacc 1260 tcatcaaggg ctgctggact gctgaaaaca gtcaccaagt tgggagatgttatgaaagcc 1320 tagtcaggga attcattgtc aacattccct ctgacataac aaacagaaagagtgatgagt 1380 atcagaaagt gtttgtcaga ggaaaatgtg ttagattctc ccctgctgtaatcaacaaat 1440 acctgggcag acctactgaa ggagtggtgg atattgctgt ttctgagcatcaaattgcca 1500 aggaaatcac tgccaaacaa gtccagcatt ggccaaagaa agggaagctttctgcaggga 1560 agctaagtgt gaagtatgca atcctgcaca ggattggcgc tgcaaactgggtacccacca 1620 atcatacttc cacagttgcc acaggtttgg gtaaatttct gtatgctgttggaaccaagt 1680 ccaaatttaa ttttggaaag tatatttttg atcaaactgt taagcattcagaatcatttg 1740 ctgtcaaatt acccattgcc ttcccaactg tattgtgtgg cattatgttgagtcaacatc 1800 ccaatatttt aaacaacatt gactctgtga tgaagaaaga atcggctctgtccctgcatt 1860 acaaactgtt tgaggggaca catgtcccag acattgtctc gacatcagggaaagctgctg 1920 cttcaggtgc tgtatccaag ggatgctttg attgctgaac tcaaggacacatgcaaggtg 1980 ctggaagcaa ccatcaaagc caccacagag aagaaaatgg agctggaacgcctgatcaaa 2040 agactctcag acagtggcat tgatgatggt gaagcagctg aggaagaagaagaagccgct 2100 gaggaagaga aagatgcagc agaagataca gaatcagatg atgatgattctgatgccacc 2160 ccatgaccat cagaccttta tttttgcttt ttactcttac tagctatagggcatgtccct 2220 ttgaacaatt gattgctatt ggtctgtaat atttgcatgc attctacttttgtcaaattc 2280 tgtctaaaaa ggggatatat attatgcatg attttgagta gtagatactatgttgcaata 2340 gtatattatg cataatttat gattttgagt agtaggatac gatgtatgcatgattcatga 2400 ttttgagggg gagttgtaag tatatgattt tgagggggag tagtatctgatgatgctgat 2460 agaagatggc atggagacag ggggagcaga aagctgatgt cacgtgagatgtcttgacat 2520 cctggaaacg acttgcaact tgcagaattt tgctgtcgcc cctacagataccgctgtgct 2580 tgattactct gataatgaaa gttgctgatc ccacttgcat aactgctcgtacctgctcag 2640 gaagtgtcta agtatgtttt agacaaaatt tgccaaaggg ggagattgttagtgcttagc 2700 tttactgagt tttaaaagat tggctaaaat tttgttaaaa cataagcacttagacaatga 2760 aggaaagctg gagttgctgc acaggatgtc caacgttatg tcaaggaatcagattgggct 2820 ccacaatgca caaggcaaga taaaaggtca aatgaagaat tgaagctgcaggatccacga 2880 tgtcggatac aatgtccagg acatcctgcc cgaaaatact ggacacataaatctgttata 2940 tctttaacag attaatgtgc agttagcaac agatttggcg atctatctttaggaacgaat 3000 taaaagataa ttaaagttcg aattacaaac ttgaatagtt cgttcagggattaaagatta 3060 aagataaaaa ctaaaagatc aaactgtatc ttttagatct ttaagtgcagatttttcagg 3120 agaatgatag atcttatcca gcgcaagatg ttgcagccca gatacgcacactgctatata 3180 aacatgaagg ctgcacgagt tttctaccaa gtccgggatt gaagagttattttgtgagtt 3240 ttgggacttg agtgttttgt gagccacctt gatgttaccc taacatcaagtgttggacct 3300 gagtgtgtag agttgatctc tattgttcag agagcaatct ctggtgtgtctttgatttat 3360 ttgtaaacac gggagagtga ttgagaggga gtgagagggg ttctcatatctaagagtggc 3420 tcttaggtag aggttgcacg ggtagtggtt aggtgagaag gttgtaaacagtggctgtta 3480 gatcttcgaa ctaacactat tttagtggat ttcctccctg gcttggtagcccccagatgt 3540 aggtgaggtt gcaccgaact gggttaacaa ttctcttgtg ttatttacttgtttaatctg 3600 ttcatactgt caaatataat ctgcatgttc tgaagcgtga tgtcgtgacatccggtacga 3660 catctgtcat tggtatcaga atttcatgct gcaaatattt acaatagacctcctcaacct 3720 caacagcaaa atcaaccaca gcagaacaat tatgacctct ccagcaacagatacaaccct 3780 ggatggagga atcaccctaa cctcagatgg tccagccctc agcaacaacaacagcagcct 3840 gctccttcct tccaaaatgc tgttggccca agcagaccat acattcctccaccaatccaa 3900 caacagcaac aaccccagaa acagccaaca gttgaggccc tccacaacttccttcgaaga 3960 acttgtgagg caaatgacta tgcagaacat gcagtttcag caagagactagagcctccat 4020 tcagagctta accaatcaga tgggacaatt ggctacccaa ttgaatcaacaacagtccca 4080 gaattctgac aagttgcctt ctcaagctgt ccaaaatccc aaaaatgtcagtgccatttc 4140 attgaggtcg ggaaagcagt gtcaaggacc tcaacccgta gcaccttcctcatctgcaaa 4200 tgaacctgcc aaacttcact ctac 4224 9 62 PRT Glycine maxmisc_feature ORF1 9 Ser Arg Pro Arg Ala Leu Ile Arg Leu Thr Ile Gly ArgArg Leu Asp 1 5 10 15 Leu Val Asp Asp Lys Val Ile Thr Leu Glu His ValAsp Thr Glu Glu 20 25 30 Gln Ile Ala Asp Ile Phe Thr Lys Ala Leu Asp AlaAsn Gln Phe Glu 35 40 45 Lys Leu Arg Gly Lys Leu Gly Ile Cys Leu Leu GluAsp Leu 50 55 60 10 579 PRT Glycine max 10 Thr Leu Ile Ala Arg Ser LeuLeu Gly Gln Asn Lys Phe Asp Arg Cys 1 5 10 15 Phe Thr Arg Pro Ser ThrPhe Leu Ile Gln Thr His Ile Phe Val Val 20 25 30 Ile Ser Phe Ser Ala PhePro Asn Ser Ser Gln Arg Phe Thr Lys Pro 35 40 45 Phe Gln Arg Leu Cys PheSer Met Ala Thr Ser Pro Lys Asp Thr Ser 50 55 60 Ser Pro Gly Ser Pro SerVal Pro Ser Ser Pro Ser Ser Thr Lys Ala 65 70 75 80 Pro Ser Asn Gln GluGln Pro Glu Phe His Ile Gln Pro Ile Gln Met 85 90 95 Ile Pro Gly Leu AlaPro Val Pro Glu Lys Leu Val Pro Ile Arg Gln 100 105 110 Gln Gly Val LysIle Ser Glu Asn Pro Ser Ile Ala Thr Ser Pro Arg 115 120 125 Glu Leu ThrArg Glu Met Asp Lys Lys Ile Arg Ser Ile Val Ser Ser 130 135 140 Ile LeuLys Asn Ala Ser Val Pro Asp Ala Asp Lys Asp Val Pro Thr 145 150 155 160Ser Ser Thr Pro Asn Ala Glu Val Leu Ser Ser Ser Ser Lys Glu Glu 165 170175 Ser Thr Glu Glu Glu Glu Gln Ala Thr Glu Glu Thr Pro Ala Pro Arg 180185 190 Ala Pro Glu Pro Ala Pro Gly Asp Leu Ile Asp Leu Glu Glu Val Glu195 200 205 Ser Asp Glu Glu Pro Ile Ala Asn Lys Leu Ala Pro Gly Ile AlaGlu 210 215 220 Arg Leu Gln Ser Arg Lys Gly Lys Thr Pro Ile Thr Arg SerGly Arg 225 230 235 240 Ile Lys Thr Met Ala Gln Lys Lys Ser Thr Pro IleThr Pro Thr Thr 245 250 255 Ser Arg Trp Ser Lys Val Ala Ile Pro Ser LysLys Arg Lys Glu Phe 260 265 270 Ser Ser Ser Asp Ser Asp Asp Asp Val GluLeu Asp Val Pro Asp Ile 275 280 285 Lys Arg Ala Lys Lys Ser Gly Lys LysVal Pro Gly Asn Val Pro Asp 290 295 300 Ala Pro Leu Asp Asn Ile Ser PheHis Ser Ile Gly Asn Val Glu Arg 305 310 315 320 Trp Lys Phe Val Tyr GlnArg Arg Leu Ala Leu Glu Arg Glu Leu Gly 325 330 335 Arg Asp Ala Leu AspCys Lys Glu Ile Met Asp Leu Ile Lys Gly Cys 340 345 350 Trp Thr Ala GluAsn Ser His Gln Val Gly Arg Cys Tyr Glu Ser Leu 355 360 365 Val Arg GluPhe Ile Val Asn Ile Pro Ser Asp Ile Thr Asn Arg Lys 370 375 380 Ser AspGlu Tyr Gln Lys Val Phe Val Arg Gly Lys Cys Val Arg Phe 385 390 395 400Ser Pro Ala Val Ile Asn Lys Tyr Leu Gly Arg Pro Thr Glu Gly Val 405 410415 Val Asp Ile Ala Val Ser Glu His Gln Ile Ala Lys Glu Ile Thr Ala 420425 430 Lys Gln Val Gln His Trp Pro Lys Lys Gly Lys Leu Ser Ala Gly Lys435 440 445 Leu Ser Val Lys Tyr Ala Ile Leu His Arg Ile Gly Ala Ala AsnTrp 450 455 460 Val Pro Thr Asn His Thr Ser Thr Val Ala Thr Gly Leu GlyLys Phe 465 470 475 480 Leu Tyr Ala Val Gly Thr Lys Ser Lys Phe Asn PheGly Lys Tyr Ile 485 490 495 Phe Asp Gln Thr Val Lys His Ser Glu Ser PheAla Val Lys Leu Pro 500 505 510 Ile Ala Phe Pro Thr Val Leu Cys Gly IleMet Leu Ser Gln His Pro 515 520 525 Asn Ile Leu Asn Asn Ile Asp Ser ValMet Lys Lys Glu Ser Ala Leu 530 535 540 Ser Leu His Tyr Lys Leu Phe GluGly Thr His Val Pro Asp Ile Val 545 550 555 560 Ser Thr Ser Gly Lys AlaAla Ala Ser Gly Ala Val Ser Lys Gly Cys 565 570 575 Phe Asp Cys 11 62PRT Glycine max 11 Ser Arg Pro Arg Ala Leu Ile Arg Leu Thr Ile Gly ArgArg Leu Asp 1 5 10 15 Leu Val Asp Asp Lys Val Ile Thr Leu Glu His ValAsp Thr Glu Glu 20 25 30 Gln Ile Ala Asp Ile Phe Thr Lys Ala Leu Asp AlaAsn Gln Phe Glu 35 40 45 Lys Leu Arg Gly Lys Leu Gly Ile Cys Leu Leu GluAsp Leu 50 55 60 12 23 DNA Artificial sequence Synthetic primer 12cccagtcacg acgttgtaaa acg 23 13 19 DNA Artificial sequence Syntheticprimer 13 tcctttaagt tcagagatt 19 14 23 DNA Artificial sequenceSynthetic primer 14 agcggataac aatttcacac agg 23 15 24 DNA Artificialsequence Synthetic primer 15 gtaatggtca accagaccac agtt 24 16 17 DNAArtificial sequence Synthetic primer 16 gacgaattgg cacttgg 17 17 18 DNAArtificial sequence Synthetic primer 17 tttgcactgc cttgggag 18 18 17 DNAArtificial sequence Synthetic primer 18 ccaaggagca caactgc 17 19 20 DNAArtificial sequence Synthetic primer 19 gctgaacaga atggacagga 20 20 19DNA Artificial sequence Synthetic primer 20 aaagatataa caagattta 19 2120 DNA Artificial sequence Synthetic primer 21 cccgatctta ttccttgaca 2022 18 DNA Artificial sequence Synthetic primer 22 cttgccacag tagtgaca 1823 18 DNA Artificial sequence Synthetic primer 23 tcttcccaag ctgtagca 1824 19 DNA Artificial sequence Synthetic primer 24 tcctttaagt tcagagatt19 25 20 DNA Artificial sequence Synthetic primer 25 agcgcgttctctactgggcc 20 26 20 DNA Artificial sequence Synthetic primer 26ccaccaaagc accatcaaac 20 27 20 DNA Artificial sequence Synthetic primer27 ggcacagaag aagagcacac 20 28 20 DNA Artificial sequence Syntheticprimer 28 tgcaaggaga tcatggacct 20 29 20 DNA Artificial sequenceSynthetic primer 29 cacaggattg gcgctgcaaa 20 30 29 DNA Artificialsequence Synthetic primer 30 tccctggctt ggtagccccc agatgtagg 29 31 21DNA Artificial sequence Synthetic primer 31 ggccctccac aacttccttc g 2132 20 DNA Artificial sequence Synthetic primer 32 cagatgagga aggtgctacg20 33 30 DNA Artificial sequence Synthetic primer 33 cccagttcggtgcaacctca cctacatctg 30 34 20 DNA Artificial sequence Synthetic primer34 ggtggctcac aaaacactca 20 35 20 DNA Artificial sequence Syntheticprimer 35 tgtgtccagt attttcgggc 20 36 20 DNA Artificial sequenceSynthetic primer 36 tcatcagata ctactccccc 20 37 22 DNA Artificialsequence Synthetic primer 37 cctaggactt gttgcaatgc ta 22 38 20 DNAArtificial sequence Synthetic primer 38 atgaggaatg tagagggacg 20 39 20DNA Artificial sequence Synthetic primer 39 ctcatgagtt ctctgcagcc 20 4029 DNA Artificial sequence Synthetic primer 40 gacaatgttg cagatacagctaaaagtgc 29 41 20 DNA Artificial sequence Synthetic primer 41ccagatggat gtgaagagcg 20 42 19 DNA Artificial sequence Synthetic primer42 tgggatggaa aatgccagc 19 43 20 DNA Artificial sequence Syntheticprimer 43 agaactgtgt gtccctatcc 20 44 20 DNA Artificial sequenceSynthetic primer 44 cctcagtgtc aacatgctcc 20 45 20 DNA Artificialsequence Synthetic primer 45 atcccatagt cactggtgcc 20 46 20 DNAArtificial sequence Synthetic primer 46 ctctgttagc ctttcatacc 20 47 20DNA Artificial sequence Synthetic primer 47 cttgatcttg tagtgactcc 20 4820 DNA Artificial sequence Synthetic primer 48 atacagtgtg gttggagtcc 2049 20 DNA Artificial sequence Synthetic primer 49 gaagtcttag actcaactcc20 50 2826 DNA Glycine max misc_feature SIRE 1 genomic clone 50gatgaaggat tcaatgtaga cttcacagag tcagaatgct tgatgacaaa agagaagaga 60gaagtcctaa tgaagggcgg cagatcaaag gacaactgtt acctgtggac acctcaagaa 120accagttact cctccacatg tctattctcc aaagaagatg aagtcaaaat atggcatcaa 180agatttggac atctgcactt aggaggcatg aagaaaatca ttgacaaagg tgctgttaga 240ggcattccca atctgaaaat agaagaaggc agaatctgtg gtgaatgtca gattggaaag 300caagtcaaga tgtccaacca gaagcttcaa catcagacca cttccagggt gctggaacta 360cttcacatgg acttgatggg gcctatgcaa gttgaaagcc ttggaagaaa aaggtatgcc 420tatgttgttg tggatgattt ctccagattt acctgggtca actttatcag agagaaatca 480gacacctttg aagtattcaa ggagttgagt ctaagacttc aaagagaaaa agactgtgtc 540atcaagagaa tcaggagtga ccatggcaga gagtttgaaa acagcaagtt tactgaattc 600tgcacatctg aaggcatcac tcatgagttc tctgcagcca ttacaccaca acaaaatggc 660atagttgaaa ggaaaaacag gaccttgcca gaagctgcta gggtcatgct tcatgccaaa 720gaacttccct ataatctctg ggctgaagcc atgaacacag catgctacat ccacaacaga 780gtcacactta gaagagggac tccaaccaca ctgtatgaaa tctggaaagg gaggaagcca 840actgtcaagc acttccacat ctgtggaagt ccatgttaca ttttggcaga tagagagcaa 900aggagaaaga tggatcccaa gagtgatgca gggatattct tgggatactc tacaaacagc 960agagcatata gagtattcaa ttccagaacc agaactgtga tggaatccat caatgtggtt 1020gttgatgatc taactccagc aagaaagaag gatgtcgaag aagatgtcag aacatcggga 1080gacaatgttg cagatacagc taaaagtgca gaaaatgcag aaaactctga ttctgctaca 1140gatgaaccaa acatcaatca acctgacaag agaccctcca ttagaatcca gaagatgcac 1200cccaaggagc tgattatagg agatccaaac agaggagtca ctacaagatc aagggagatt 1260gagattatct ccaattcatg ttttgtctcc aaaattgagc ccaagaatgt gaaagaggca 1320ctgactgatg agttctggat caatgctatg caagaagaat tggagcaatt caaaaggaat 1380gaagtttggg agctagttcc taggcccgag ggaactaatg tgattggcac caagtggatc 1440ttcaagaaca aaaccaatga agaaggtgtt ataaccagaa acaaggccag acttgttgct 1500caaggctaca ctcagattga aggtgtagac tttgatgaaa cttttgcccc tggtgctaaa 1560cttgagtcca tcagactgtt acttggtgta gcttgcatcc tcaaattcaa gctgtaccag 1620atggatgtga agagcgcatt tctgaatgga tacctgaatg aagaagccta tgtggagcag 1680ccaaagggat ttgtagatcc aactcatcca gatcatgtat acaggctcaa gaagctctgc 1740tatggattga agcaagcttc aagagcttgg tatgaaaggc taacagagtt ccttactcag 1800caagggtata ggaagggggg gattgacaag accctttttg ttaaacaaga tgctggaaaa 1860ttgatgatag cacagatata tgttgatgac attgtgtttg gagggatgtt gaatgagatg 1920cttcgacatt ttgtccaaca gatgcaattt gaatttgaga tgagttttgt tggagagctg 1980aattattttt tgggaatcca agtgaagcag atggaagaat ccatattcct ttcacaaagc 2040aagtatgcaa agaacattgt caagaagttt gggatggaaa atgccagcca taaaagaaca 2100cctgcaccta atcaattgaa gctgtcaaaa gatgaagctg gcaccagtgt tgatcaaagt 2160ttgtacagaa gcatgattgg gagcttaata tatttaacag ctagcagacc tgacatcacc 2220tatgcagtag gtggttgtgc aagatatcaa gccaatccta agataagtca cttgaatcaa 2280gtaaagagaa ttttgaaata tgtaaatggc accagtgact atgggattat gtactgtcat 2340tgttcagatt caatgctggt tgggtattgt gatgctgatt gggctggaag tgtagatgac 2400agaaaaagca cttttggtgg atgtttttat ttgggaacca attttatttc atggttcagc 2460aagaagcaga actgtgtgtc cctatccact gcagaagcag agtatattgc agcaggaagc 2520agctgttcac aactagtttg gatgaagcag atgctcaagg agtacaatgt cgaacaagat 2580gtcatgacat tgtactgtga caacttgagt gctattaata tttctaaaaa tcctgttcaa 2640cacagcagaa ccaagcacat tgacattaga catcactata ttagagatct tgttgatgat 2700aaagttatca cactggagca tgttgacact gaggaacaaa tagcagatat tttcacaaag 2760gcattggatg caaatcagtt tgaaaaactg aggggcaagc tgggcatttg tctgctagag 2820gattta 2826 51 942 PRT Glycine max 51 Asp Glu Gly Phe Asn Val Asp PheThr Glu Ser Glu Cys Leu Met Thr 1 5 10 15 Lys Glu Lys Arg Glu Val LeuMet Lys Gly Gly Arg Ser Lys Asp Asn 20 25 30 Cys Tyr Leu Trp Thr Pro GlnGlu Thr Ser Tyr Ser Ser Thr Cys Leu 35 40 45 Phe Ser Lys Glu Asp Glu ValLys Ile Trp His Gln Arg Phe Gly His 50 55 60 Leu His Leu Gly Gly Met LysLys Ile Ile Asp Lys Gly Ala Val Arg 65 70 75 80 Gly Ile Pro Asn Leu LysIle Glu Glu Gly Arg Ile Cys Gly Glu Cys 85 90 95 Gln Ile Gly Lys Gln ValLys Met Ser Asn Gln Lys Leu Gln His Gln 100 105 110 Thr Thr Ser Arg ValLeu Glu Leu Leu His Met Asp Leu Met Gly Pro 115 120 125 Met Gln Val GluSer Leu Gly Arg Lys Arg Tyr Ala Tyr Val Val Val 130 135 140 Asp Asp PheSer Arg Phe Thr Trp Val Asn Phe Ile Arg Glu Lys Ser 145 150 155 160 AspThr Phe Glu Val Phe Lys Glu Leu Ser Leu Arg Leu Gln Arg Glu 165 170 175Lys Asp Cys Val Ile Lys Arg Ile Arg Ser Asp His Gly Arg Glu Phe 180 185190 Glu Asn Ser Lys Phe Thr Glu Phe Cys Thr Ser Glu Gly Ile Thr His 195200 205 Glu Phe Ser Ala Ala Ile Thr Pro Gln Gln Asn Gly Ile Val Glu Arg210 215 220 Lys Asn Arg Thr Leu Pro Glu Ala Ala Arg Val Met Leu His AlaLys 225 230 235 240 Glu Leu Pro Tyr Asn Leu Trp Ala Glu Ala Met Asn ThrAla Cys Tyr 245 250 255 Ile His Asn Arg Val Thr Leu Arg Arg Gly Thr ProThr Thr Leu Tyr 260 265 270 Glu Ile Trp Lys Gly Arg Lys Pro Thr Val LysHis Phe His Ile Cys 275 280 285 Gly Ser Pro Cys Tyr Ile Leu Ala Asp ArgGlu Gln Arg Arg Lys Met 290 295 300 Asp Pro Lys Ser Asp Ala Gly Ile PheLeu Gly Tyr Ser Thr Asn Ser 305 310 315 320 Arg Ala Tyr Arg Val Phe AsnSer Arg Thr Arg Thr Val Met Glu Ser 325 330 335 Ile Asn Val Val Val AspAsp Leu Thr Pro Ala Arg Lys Lys Asp Val 340 345 350 Glu Glu Asp Val ArgThr Ser Gly Asp Asn Val Ala Asp Thr Ala Lys 355 360 365 Ser Ala Glu AsnAla Glu Asn Ser Asp Ser Ala Thr Asp Glu Pro Asn 370 375 380 Ile Asn GlnPro Asp Lys Arg Pro Ser Ile Arg Ile Gln Lys Met His 385 390 395 400 ProLys Glu Leu Ile Ile Gly Asp Pro Asn Arg Gly Val Thr Thr Arg 405 410 415Ser Arg Glu Ile Glu Ile Ile Ser Asn Ser Cys Phe Val Ser Lys Ile 420 425430 Glu Pro Lys Asn Val Lys Glu Ala Leu Thr Asp Glu Phe Trp Ile Asn 435440 445 Ala Met Gln Glu Glu Leu Glu Gln Phe Lys Arg Asn Glu Val Trp Glu450 455 460 Leu Val Pro Arg Pro Glu Gly Thr Asn Val Ile Gly Thr Lys TrpIle 465 470 475 480 Phe Lys Asn Lys Thr Asn Glu Glu Gly Val Ile Thr ArgAsn Lys Ala 485 490 495 Arg Leu Val Ala Gln Gly Tyr Thr Gln Ile Glu GlyVal Asp Phe Asp 500 505 510 Glu Thr Phe Ala Pro Gly Ala Lys Leu Glu SerIle Arg Leu Leu Leu 515 520 525 Gly Val Ala Cys Ile Leu Lys Phe Lys LeuTyr Gln Met Asp Val Lys 530 535 540 Ser Ala Phe Leu Asn Gly Tyr Leu AsnGlu Glu Ala Tyr Val Glu Gln 545 550 555 560 Pro Lys Gly Phe Val Asp ProThr His Pro Asp His Val Tyr Arg Leu 565 570 575 Lys Lys Leu Cys Tyr GlyLeu Lys Gln Ala Ser Arg Ala Trp Tyr Glu 580 585 590 Arg Leu Thr Glu PheLeu Thr Gln Gln Gly Tyr Arg Lys Gly Gly Ile 595 600 605 Asp Lys Thr LeuPhe Val Lys Gln Asp Ala Gly Lys Leu Met Ile Ala 610 615 620 Gln Ile TyrVal Asp Asp Ile Val Phe Gly Gly Met Leu Asn Glu Met 625 630 635 640 LeuArg His Phe Val Gln Gln Met Gln Phe Glu Phe Glu Met Ser Phe 645 650 655Val Gly Glu Leu Asn Tyr Phe Leu Gly Ile Gln Val Lys Gln Met Glu 660 665670 Glu Ser Ile Phe Leu Ser Gln Ser Lys Tyr Ala Lys Asn Ile Val Lys 675680 685 Lys Phe Gly Met Glu Asn Ala Ser His Lys Arg Thr Pro Ala Pro Asn690 695 700 Gln Leu Lys Leu Ser Lys Asp Glu Ala Gly Thr Ser Val Asp GlnSer 705 710 715 720 Leu Tyr Arg Ser Met Ile Gly Ser Leu Ile Tyr Leu ThrAla Ser Arg 725 730 735 Pro Asp Ile Thr Tyr Ala Val Gly Gly Cys Ala ArgTyr Gln Ala Asn 740 745 750 Pro Lys Ile Ser His Leu Asn Gln Val Lys ArgIle Leu Lys Tyr Val 755 760 765 Asn Gly Thr Ser Asp Tyr Gly Ile Met TyrCys His Cys Ser Asp Ser 770 775 780 Met Leu Val Gly Tyr Cys Asp Ala AspTrp Ala Gly Ser Val Asp Asp 785 790 795 800 Arg Lys Ser Thr Phe Gly GlyCys Phe Tyr Leu Gly Thr Asn Phe Ile 805 810 815 Ser Trp Phe Ser Lys LysGln Asn Cys Val Ser Leu Ser Thr Ala Glu 820 825 830 Ala Glu Tyr Ile AlaAla Gly Ser Ser Cys Ser Gln Leu Val Trp Met 835 840 845 Lys Gln Met LeuLys Glu Tyr Asn Val Glu Gln Asp Val Met Thr Leu 850 855 860 Tyr Cys AspAsn Leu Ser Ala Ile Asn Ile Ser Lys Asn Pro Val Gln 865 870 875 880 HisSer Arg Thr Lys His Ile Asp Ile Arg His His Tyr Ile Arg Asp 885 890 895Leu Val Asp Asp Lys Val Ile Thr Leu Glu His Val Asp Thr Glu Glu 900 905910 Gln Ile Ala Asp Ile Phe Thr Lys Ala Leu Asp Ala Asn Gln Phe Glu 915920 925 Lys Leu Arg Gly Lys Leu Gly Ile Cys Leu Leu Glu Asp Leu 930 935940 52 400 PRT Glycine max 52 Asp Glu Gly Phe Asn Val Asp Phe Thr GluSer Glu Cys Leu Met Thr 1 5 10 15 Lys Glu Lys Arg Glu Val Leu Met LysGly Gly Arg Ser Lys Asp Asn 20 25 30 Cys Tyr Leu Trp Thr Pro Gln Glu ThrSer Tyr Ser Ser Thr Cys Leu 35 40 45 Phe Ser Lys Glu Asp Glu Val Lys IleTrp His Gln Arg Phe Gly His 50 55 60 Leu His Leu Gly Gly Met Lys Lys IleIle Asp Lys Gly Ala Val Arg 65 70 75 80 Gly Ile Pro Asn Leu Lys Ile GluGlu Gly Arg Ile Cys Gly Glu Cys 85 90 95 Gln Ile Gly Lys Gln Val Lys MetSer Asn Gln Lys Leu Gln His Gln 100 105 110 Thr Thr Ser Arg Val Leu GluLeu Leu His Met Asp Leu Met Gly Pro 115 120 125 Met Gln Val Glu Ser LeuGly Arg Lys Arg Tyr Ala Tyr Val Val Val 130 135 140 Asp Asp Phe Ser ArgPhe Thr Trp Val Asn Phe Ile Arg Glu Lys Ser 145 150 155 160 Asp Thr PheGlu Val Phe Lys Glu Leu Ser Leu Arg Leu Gln Arg Glu 165 170 175 Lys AspCys Val Ile Lys Arg Ile Arg Ser Asp His Gly Arg Glu Phe 180 185 190 GluAsn Ser Lys Phe Thr Glu Phe Cys Thr Ser Glu Gly Ile Thr His 195 200 205Glu Phe Ser Ala Ala Ile Thr Pro Gln Gln Asn Gly Ile Val Glu Arg 210 215220 Lys Asn Arg Thr Leu Pro Glu Ala Ala Arg Val Met Leu His Ala Lys 225230 235 240 Glu Leu Pro Tyr Asn Leu Trp Ala Glu Ala Met Asn Thr Ala CysTyr 245 250 255 Ile His Asn Arg Val Thr Leu Arg Arg Gly Thr Pro Thr ThrLeu Tyr 260 265 270 Glu Ile Trp Lys Gly Arg Lys Pro Thr Val Lys His PheHis Ile Cys 275 280 285 Gly Ser Pro Cys Tyr Ile Leu Ala Asp Arg Glu GlnArg Arg Lys Met 290 295 300 Asp Pro Lys Ser Asp Ala Gly Ile Phe Leu GlyTyr Ser Thr Asn Ser 305 310 315 320 Arg Ala Tyr Arg Val Phe Asn Ser ArgThr Arg Thr Val Met Glu Ser 325 330 335 Ile Asn Val Val Val Asp Asp LeuThr Pro Ala Arg Lys Lys Asp Val 340 345 350 Glu Glu Asp Val Arg Thr SerGly Asp Asn Val Ala Asp Thr Ala Lys 355 360 365 Ser Ala Glu Asn Ala GluAsn Ser Asp Ser Ala Thr Asp Glu Pro Asn 370 375 380 Ile Asn Gln Pro AspLys Arg Pro Ser Ile Arg Ile Gln Lys Met His 385 390 395 400 53 381 PRTGlycine max 53 Pro Lys Glu Leu Ile Ile Gly Asp Pro Asn Arg Gly Val ThrThr Arg 1 5 10 15 Ser Arg Glu Ile Glu Ile Ile Ser Asn Ser Cys Phe ValSer Lys Ile 20 25 30 Glu Pro Lys Asn Val Lys Glu Ala Leu Thr Asp Glu PheTrp Ile Asn 35 40 45 Ala Met Gln Glu Glu Leu Glu Gln Phe Lys Arg Asn GluVal Trp Glu 50 55 60 Leu Val Pro Arg Pro Glu Gly Thr Asn Val Ile Gly ThrLys Trp Ile 65 70 75 80 Phe Lys Asn Lys Thr Asn Glu Glu Gly Val Ile ThrArg Asn Lys Ala 85 90 95 Arg Leu Val Ala Gln Gly Tyr Thr Gln Ile Glu GlyVal Asp Phe Asp 100 105 110 Glu Thr Phe Ala Pro Gly Ala Lys Leu Glu SerIle Arg Leu Leu Leu 115 120 125 Gly Val Ala Cys Ile Leu Lys Phe Lys LeuTyr Gln Met Asp Val Lys 130 135 140 Ser Ala Phe Leu Asn Gly Tyr Leu AsnGlu Glu Ala Tyr Val Glu Gln 145 150 155 160 Pro Lys Gly Phe Val Asp ProThr His Pro Asp His Val Tyr Arg Leu 165 170 175 Lys Lys Leu Cys Tyr GlyLeu Lys Gln Ala Ser Arg Ala Trp Tyr Glu 180 185 190 Arg Leu Thr Glu PheLeu Thr Gln Gln Gly Tyr Arg Lys Gly Gly Ile 195 200 205 Asp Lys Thr LeuPhe Val Lys Gln Asp Ala Gly Lys Leu Met Ile Ala 210 215 220 Gln Ile TyrVal Asp Asp Ile Val Phe Gly Gly Met Leu Asn Glu Met 225 230 235 240 LeuArg His Phe Val Gln Gln Met Gln Phe Glu Phe Glu Met Ser Phe 245 250 255Val Gly Glu Leu Asn Tyr Phe Leu Gly Ile Gln Val Lys Gln Met Glu 260 265270 Glu Ser Ile Phe Leu Ser Gln Ser Lys Tyr Ala Lys Asn Ile Val Lys 275280 285 Lys Phe Gly Met Glu Asn Ala Ser His Lys Arg Thr Pro Ala Pro Asn290 295 300 Gln Leu Lys Leu Ser Lys Asp Glu Ala Gly Thr Ser Val Asp GlnSer 305 310 315 320 Leu Tyr Arg Ser Met Ile Gly Ser Leu Ile Tyr Leu ThrAla Ser Arg 325 330 335 Pro Asp Ile Thr Tyr Ala Val Gly Gly Cys Ala ArgTyr Gln Ala Asn 340 345 350 Pro Lys Ile Ser His Leu Asn Gln Val Lys ArgIle Leu Lys Tyr Val 355 360 365 Asn Gly Thr Ser Asp Tyr Gly Ile Met TyrCys His Cys 370 375 380 54 166 PRT Glycine max SITE (162)..(162) X= anyamino acid 54 Ser Asp Ser Met Leu Val Gly Tyr Cys Asp Ala Asp Trp AlaGly Ser 1 5 10 15 Val Asp Asp Arg Lys Ser Thr Phe Gly Gly Cys Phe TyrLeu Gly Thr 20 25 30 Asn Phe Ile Ser Trp Phe Ser Lys Lys Gln Asn Cys ValSer Leu Ser 35 40 45 Thr Ala Glu Ala Glu Tyr Ile Ala Ala Gly Ser Ser CysSer Gln Leu 50 55 60 Val Trp Met Lys Gln Met Leu Lys Glu Tyr Asn Val GluGln Asp Val 65 70 75 80 Met Thr Leu Tyr Cys Asp Asn Leu Ser Ala Ile AsnIle Ser Lys Asn 85 90 95 Pro Val Gln His Ser Arg Thr Lys His Ile Asp IleArg His His Tyr 100 105 110 Ile Arg Asp Leu Val Asp Asp Lys Val Ile ThrLeu Glu His Val Asp 115 120 125 Thr Glu Glu Gln Ile Ala Asp Ile Phe ThrLys Ala Leu Asp Ala Asn 130 135 140 Gln Phe Glu Lys Leu Arg Gly Lys LeuGly Ile Cys Leu Leu Glu Asp 145 150 155 160 Leu Xaa Asn Pro Xaa Pro 16555 613 PRT Glycine max 55 Thr Leu Ile Ala Arg Ser Leu Leu Gly Gln AsnLys Phe Asp Arg Cys 1 5 10 15 Phe Thr Arg Pro Ser Thr Phe Leu Ile GlnThr His Ile Phe Val Val 20 25 30 Ile Ser Phe Ser Ala Phe Pro Asn Ser SerGln Arg Phe Thr Lys Pro 35 40 45 Phe Gln Arg Leu Cys Phe Ser Met Ala ThrSer Pro Lys Asp Thr Ser 50 55 60 Ser Pro Gly Ser Pro Ser Val Pro Ser SerPro Ser Ser Thr Lys Ala 65 70 75 80 Pro Ser Asn Gln Glu Gln Pro Glu PheHis Ile Gln Pro Ile Gln Met 85 90 95 Ile Pro Gly Leu Ala Pro Val Pro GluLys Leu Val Pro Ile Arg Gln 100 105 110 Gln Gly Val Lys Ile Ser Glu AsnPro Ser Ile Ala Thr Ser Pro Arg 115 120 125 Glu Leu Thr Arg Glu Met AspLys Lys Ile Arg Ser Ile Val Ser Ser 130 135 140 Ile Leu Lys Asn Ala SerVal Pro Asp Ala Asp Lys Asp Val Pro Thr 145 150 155 160 Ser Ser Thr ProAsn Ala Glu Val Leu Ser Ser Ser Ser Lys Glu Glu 165 170 175 Ser Thr GluGlu Glu Glu Gln Ala Thr Glu Glu Thr Pro Ala Pro Arg 180 185 190 Ala ProGlu Pro Ala Pro Gly Asp Leu Ile Asp Leu Glu Glu Val Glu 195 200 205 SerAsp Glu Glu Pro Ile Ala Asn Lys Leu Ala Pro Gly Ile Ala Glu 210 215 220Arg Leu Gln Ser Arg Lys Gly Lys Thr Pro Ile Thr Arg Ser Gly Arg 225 230235 240 Ile Lys Thr Met Ala Gln Lys Lys Ser Thr Pro Ile Thr Pro Thr Thr245 250 255 Ser Arg Trp Ser Lys Val Ala Ile Pro Ser Lys Lys Arg Lys GluPhe 260 265 270 Ser Ser Ser Asp Ser Asp Asp Asp Val Glu Leu Asp Val ProAsp Ile 275 280 285 Lys Arg Ala Lys Lys Ser Gly Lys Lys Val Pro Gly AsnVal Pro Asp 290 295 300 Ala Pro Leu Asp Asn Ile Ser Phe His Ser Ile GlyAsn Val Glu Arg 305 310 315 320 Trp Lys Phe Val Tyr Gln Arg Arg Leu AlaLeu Glu Arg Glu Leu Gly 325 330 335 Arg Asp Ala Leu Asp Cys Lys Glu IleMet Asp Leu Ile Lys Gly Cys 340 345 350 Trp Thr Ala Glu Asn Ser His GlnVal Gly Arg Cys Tyr Glu Ser Leu 355 360 365 Val Arg Glu Phe Ile Val AsnIle Pro Ser Asp Ile Thr Asn Arg Lys 370 375 380 Ser Asp Glu Tyr Gln LysVal Phe Val Arg Gly Lys Cys Val Arg Phe 385 390 395 400 Ser Pro Ala ValIle Asn Lys Tyr Leu Gly Arg Pro Thr Glu Gly Val 405 410 415 Val Asp IleAla Val Ser Glu His Gln Ile Ala Lys Glu Ile Thr Ala 420 425 430 Lys GlnVal Gln His Trp Pro Lys Lys Gly Lys Leu Ser Ala Gly Lys 435 440 445 LeuSer Val Lys Tyr Ala Ile Leu His Arg Ile Gly Ala Ala Asn Trp 450 455 460Val Pro Thr Asn His Thr Ser Thr Val Ala Thr Gly Leu Gly Lys Phe 465 470475 480 Leu Tyr Ala Val Gly Thr Lys Ser Lys Phe Asn Phe Gly Lys Tyr Ile485 490 495 Phe Asp Gln Thr Val Lys His Ser Glu Ser Phe Ala Val Lys LeuPro 500 505 510 Ile Ala Phe Pro Thr Val Leu Cys Gly Ile Met Leu Ser GlnHis Pro 515 520 525 Asn Ile Leu Asn Asn Ile Asp Ser Val Met Lys Lys GluSer Ala Leu 530 535 540 Ser Leu His Tyr Lys Leu Phe Glu Gly Thr His ValPro Asp Ile Val 545 550 555 560 Ser Thr Ser Gly Lys Ala Ala Ala Ser GlyAla Val Ser Lys Gly Cys 565 570 575 Phe Asp Cys Thr Gln Gly His Met GlnGly Ala Gly Ser Asn His Gln 580 585 590 Ser His His Arg Lys Lys Asn GlyAla Gly Thr Pro Asp Gln Lys Thr 595 600 605 Leu Arg Gln Trp His 610 56183 DNA Glycine max 56 gttgctgcac aatgcacaag gcaagataaa agaagtgaagctgcaggatc cacgatgtcg 60 gatacgatgt ccaagacatc tggcccgaaa atactggacacataaatctg ttatatcttt 120 aacagattat tgtgcagtta gcaacaggtt agacgatctatctttaggaa cgaactcttc 180 tag 183 57 138 DNA Glycine max 57 gacttcgttatgtcaaggaa taagatcggg ctgcacaatg cacaaggcaa gataaaatgt 60 caaatgaagaattgaagctg caggatccat gatgtcggat acaatgtcca ggacatcctg 120 cccgaaaatactggagtt 138 58 220 DNA Glycine max 58 tccaacgtta tgtcaaggaa tcagattgggctccacaatg cacaaggcaa gataaaaggt 60 caaatgaaga attgaagctg caggatccacgatgtcggat acaatgtcca ggacatcctg 120 cccgaaaata ctggacacat aaatctgttatatctttaac agattaatgt gcagttagca 180 acagatttgg cgatctatct ttaggaacgaattaaaagat 220 59 579 PRT Glycine max 59 Thr Leu Ile Ala Arg Ser Leu LeuGly Gln Asn Lys Phe Asp Arg Cys 1 5 10 15 Phe Thr Arg Pro Ser Thr PheLeu Ile Gln Thr His Ile Phe Val Val 20 25 30 Ile Ser Phe Ser Ala Phe ProAsn Ser Ser Gln Arg Phe Thr Lys Pro 35 40 45 Phe Gln Arg Leu Cys Phe SerMet Ala Thr Ser Pro Lys Asp Thr Ser 50 55 60 Ser Pro Gly Ser Pro Ser ValPro Ser Ser Pro Ser Ser Thr Lys Ala 65 70 75 80 Pro Ser Asn Gln Glu GlnPro Glu Phe His Ile Gln Pro Ile Gln Met 85 90 95 Ile Pro Gly Leu Ala ProVal Pro Glu Lys Leu Val Pro Ile Arg Gln 100 105 110 Gln Gly Val Lys IleSer Glu Asn Pro Ser Ile Ala Thr Ser Pro Arg 115 120 125 Glu Leu Thr ArgGlu Met Asp Lys Lys Ile Arg Ser Ile Val Ser Ser 130 135 140 Ile Leu LysAsn Ala Ser Val Pro Asp Ala Asp Lys Asp Val Pro Thr 145 150 155 160 SerSer Thr Pro Asn Ala Glu Val Leu Ser Ser Ser Ser Lys Glu Glu 165 170 175Ser Thr Glu Glu Glu Glu Gln Ala Thr Glu Glu Thr Pro Ala Pro Arg 180 185190 Ala Pro Glu Pro Ala Pro Gly Asp Leu Ile Asp Leu Glu Glu Val Glu 195200 205 Ser Asp Glu Glu Pro Ile Ala Asn Lys Leu Ala Pro Gly Ile Ala Glu210 215 220 Arg Leu Gln Ser Arg Lys Gly Lys Thr Pro Ile Thr Arg Ser GlyArg 225 230 235 240 Ile Lys Thr Met Ala Gln Lys Lys Ser Thr Pro Ile ThrPro Thr Thr 245 250 255 Ser Arg Trp Ser Lys Val Ala Ile Pro Ser Lys LysArg Lys Glu Phe 260 265 270 Ser Ser Ser Asp Ser Asp Asp Asp Val Glu LeuAsp Val Pro Asp Ile 275 280 285 Lys Arg Ala Lys Lys Ser Gly Lys Lys ValPro Gly Asn Val Pro Asp 290 295 300 Ala Pro Leu Asp Asn Ile Ser Phe HisSer Ile Gly Asn Val Glu Arg 305 310 315 320 Trp Lys Phe Val Tyr Gln ArgArg Leu Ala Leu Glu Arg Glu Leu Gly 325 330 335 Arg Asp Ala Leu Asp CysLys Glu Ile Met Asp Leu Ile Lys Gly Cys 340 345 350 Trp Thr Ala Glu AsnSer His Gln Val Gly Arg Cys Tyr Glu Ser Leu 355 360 365 Val Arg Glu PheIle Val Asn Ile Pro Ser Asp Ile Thr Asn Arg Lys 370 375 380 Ser Asp GluTyr Gln Lys Val Phe Val Arg Gly Lys Cys Val Arg Phe 385 390 395 400 SerPro Ala Val Ile Asn Lys Tyr Leu Gly Arg Pro Thr Glu Gly Val 405 410 415Val Asp Ile Ala Val Ser Glu His Gln Ile Ala Lys Glu Ile Thr Ala 420 425430 Lys Gln Val Gln His Trp Pro Lys Lys Gly Lys Leu Ser Ala Gly Lys 435440 445 Leu Ser Val Lys Tyr Ala Ile Leu His Arg Ile Gly Ala Ala Asn Trp450 455 460 Val Pro Thr Asn His Thr Ser Thr Val Ala Thr Gly Leu Gly LysPhe 465 470 475 480 Leu Tyr Ala Val Gly Thr Lys Ser Lys Phe Asn Phe GlyLys Tyr Ile 485 490 495 Phe Asp Gln Thr Val Lys His Ser Glu Ser Phe AlaVal Lys Leu Pro 500 505 510 Ile Ala Phe Pro Thr Val Leu Cys Gly Ile MetLeu Ser Gln His Pro 515 520 525 Asn Ile Leu Asn Asn Ile Asp Ser Val MetLys Lys Glu Ser Ala Leu 530 535 540 Ser Leu His Tyr Lys Leu Phe Glu GlyThr His Val Pro Asp Ile Val 545 550 555 560 Ser Thr Ser Gly Lys Ala AlaAla Ser Gly Ala Val Ser Lys Gly Cys 565 570 575 Phe Asp Cys 60 14 PRTArtificial sequence sequence synthetic peptide 60 Cys Xaa Xaa Cys XaaXaa Xaa Xaa His Xaa Xaa Xaa Xaa Cys 1 5 10 61 14 PRT Glycine max 61 CysHis Tyr Cys Gly Lys Tyr Gly His Ile Lys Pro Phe Cys 1 5 10 62 14 PRTLilium henryi 62 Cys Tyr Ser Cys Gly Gln Pro Gly His Phe Lys Ala Asn Cys1 5 10 63 14 PRT Drosophila melanogaster 63 Cys His His Cys Gly Arg GluGly His Ile Lys Lys Asp Cys 1 5 10 64 14 PRT Arabidopsis thaliana 64 CysTrp Tyr Cys Lys Lys Glu Gly His Val Lys Lys Asp Cys 1 5 10 65 14 PRTNicotiana tabacum 65 Cys Tyr Asn Cys Val Lys Pro Gly His Phe Lys Arg AspCys 1 5 10 66 14 PRT HIV 1 66 Cys Trp Lys Cys Gly Lys Pro Gly His IleMet Thr Asn Cys 1 5 10 67 14 PRT Solanum tuberosum 67 Cys Asp His CysLys Lys Tyr Trp His Thr Arg Glu Thr Cys 1 5 10 68 14 PRT Cauliflowermosaic 68 Cys Trp Ile Cys Asn Ile Glu Gly His Tyr Ala Asn Glu Cys 1 5 1069 10 PRT Arabidopsis thaliana 69 Leu Asp Ser Gly Cys Thr Ser His MetSer 1 5 10 70 10 PRT Nicotiana tabacum 70 Val Asp Thr Ala Ala Ser HisHis Ala Thr 1 5 10 71 10 PRT Drosophila melanogaster 71 Leu Asp Ser GlyAla Ser Asp His Leu Thr 1 5 10 72 10 PRT Solanum tuberosum 72 Ile AspSer Arg Ala Ser Asp His Met Thr 1 5 10 73 10 PRT Lilium henryi 73 IleAsp Thr Gly Ser Thr His Ser Phe Ile 1 5 10 74 10 PRT Cauliflower mosaic74 Val Asp Thr Gly Ala Ser Leu Cys Ile Ala 1 5 10 75 10 PRT HIV 1 75 LeuAsp Thr Gly Arg Asp Asp Thr Val Leu 1 5 10 76 22 RNA Glycine maxmisc_feature 3′ end of soybean tRNA met 1 (sequence is identified in the5′ to 3′ direction in the sequence listing and in the 3′ to 5′ direction in Figure 11) 76 ucgaaaccug gcucugauac ca 22 77 20 DNA Solanumtuberosum 77 ttgcagtatc taaactttca 20 78 65 PRT Drosophila melanogaster78 His Lys Arg Ala Lys His Ile Asp Ile Lys Tyr His Phe Ala Arg Glu 1 510 15 Gln Val Gln Asn Asn Val Ile Cys Leu Glu Tyr Ile Pro Thr Glu Asn 2025 30 Gln Leu Ala Asp Ile Phe Thr Lys Pro Leu Pro Ala Ala Arg Phe Val 3540 45 Glu Leu Arg Asp Lys Leu Gly Leu Leu Gln Asp Asp Gln Ser Asn Ala 5055 60 Glu 65 79 441 PRT Zea mays SITE (1)..(441) amino acid positions 86526 of Opie 2 retroelement 79 Asn Met Gly Tyr Asn Cys Leu Phe Thr AsnIle Asp Val Ser Val Phe 1 5 10 15 Arg Arg Cys Asp Gly Ser Leu Ala PheLys Gly Val Leu Asp Gly Lys 20 25 30 Leu Tyr Leu Val Asp Phe Ala Lys GluGlu Ala Gly Leu Asp Ala Cys 35 40 45 Leu Ile Ala Lys Thr Ser Met Gly TrpLeu Trp His Arg Arg Leu Ala 50 55 60 His Val Gly Met Lys Asn Leu His LysLeu Leu Lys Gly Glu His Val 65 70 75 80 Ile Gly Leu Thr Asn Val Gln PheGlu Lys Asp Arg Pro Cys Ala Ala 85 90 95 Cys Gln Ala Gly Lys Gln Val GlyGly Ser His His Thr Lys Asn Val 100 105 110 Met Thr Thr Ser Arg Pro LeuGlu Met Leu His Met Asp Leu Phe Gly 115 120 125 Pro Val Ala Tyr Leu SerIle Gly Gly Ser Lys Tyr Gly Leu Val Ile 130 135 140 Val Asp Asp Phe SerArg Phe Thr Trp Val Phe Phe Leu Gln Glu Lys 145 150 155 160 Ser Glu ThrGln Gly Thr Leu Lys Arg Phe Leu Arg Arg Ala Gln Asn 165 170 175 Glu PheGlu Leu Lys Val Lys Lys Ile Arg Ser Asp Asn Gly Ser Glu 180 185 190 PheLys Asn Leu Gln Val Glu Glu Phe Leu Glu Glu Glu Gly Ile Lys 195 200 205His Glu Phe Ser Ala Pro Tyr Thr Pro Gln Gln Asn Gly Val Val Glu 210 215220 Arg Lys Asn Arg Thr Leu Ile Asp Met Ala Arg Thr Met Leu Gly Glu 225230 235 240 Phe Lys Thr Pro Glu Cys Phe Trp Thr Glu Ala Val Asn Thr AlaCys 245 250 255 His Ala Ile Asn Arg Val Tyr Leu His Arg Ile Leu Lys AsnThr Ser 260 265 270 Tyr Glu Leu Leu Thr Gly Asn Lys Pro Asn Val Ser TyrPhe Arg Val 275 280 285 Phe Gly Ser Lys Cys Tyr Ile Leu Val Lys Lys GlyArg Asn Ser Lys 290 295 300 Phe Ala Pro Lys Ala Val Glu Gly Phe Leu LeuGly Tyr Asp Ser Asn 305 310 315 320 Thr Lys Ala Tyr Arg Val Phe Asn LysSer Ser Gly Leu Val Glu Val 325 330 335 Ser Gly Asp Val Val Phe Asp GluThr Asn Gly Ser Pro Arg Glu Gln 340 345 350 Val Val Asp Cys Asp Asp ValAsp Glu Glu Asp Ile Pro Thr Ala Ala 355 360 365 Ile Arg Thr Met Ala IleGly Glu Val Arg Pro Gln Glu Gln Asp Glu 370 375 380 Arg Glu Gln Pro SerPro Ser Thr Met Val His Pro Pro Thr Gln Asp 385 390 395 400 Asp Glu GlnVal His Gln Gln Glu Val Cys Asp Gln Gly Gly Ala Gln 405 410 415 Asp AspHis Val Leu Glu Glu Glu Ala Gln Pro Ala Pro Pro Thr Gln 420 425 430 ValArg Ala Met Ile Gln Arg Asp His 435 440 80 380 PRT Zea mays SITE(1)..(380) amino acid positions 527 906 of Opie 2 retroelement 80 ProVal Asp Gln Ile Leu Gly Asp Ile Ser Lys Gly Val Thr Thr Arg 1 5 10 15Ser Arg Leu Val Asn Phe Cys Glu His Asn Ser Phe Val Ser Ser Ile 20 25 30Glu Pro Phe Arg Val Glu Glu Ala Leu Leu Asp Pro Asp Trp Val Leu 35 40 45Ala Met Gln Glu Glu Leu Asn Asn Phe Lys Arg Asn Glu Val Trp Thr 50 55 60Leu Val Pro Arg Pro Lys Gln Asn Val Val Gly Thr Lys Trp Val Phe 65 70 7580 Arg Asn Lys Gln Asp Glu Arg Gly Val Val Thr Arg Asn Lys Ala Arg 85 9095 Leu Val Ala Lys Gly Tyr Ala Gln Val Ala Gly Leu Asp Phe Glu Glu 100105 110 Thr Phe Ala Pro Val Ala Arg Leu Glu Ser Ile Arg Ile Leu Leu Ala115 120 125 Tyr Ala Ala His His Ser Phe Arg Leu Tyr Gln Met Asp Val LysSer 130 135 140 Ala Phe Leu Asn Gly Pro Ile Lys Glu Glu Val Tyr Val GluGln Pro 145 150 155 160 Pro Gly Phe Glu Asp Glu Arg Tyr Pro Asp His ValCys Lys Leu Ser 165 170 175 Lys Ala Leu Tyr Gly Leu Lys Gln Ala Pro ArgAla Trp Tyr Glu Cys 180 185 190 Leu Arg Asp Phe Leu Ile Ala Asn Ala PheLys Val Gly Lys Ala Asp 195 200 205 Pro Thr Leu Phe Thr Lys Thr Cys AspGly Asp Leu Phe Val Cys Gln 210 215 220 Ile Tyr Val Asp Asp Ile Ile PheGly Ser Thr Asn Gln Lys Ser Cys 225 230 235 240 Glu Glu Phe Ser Arg ValMet Thr Gln Lys Phe Glu Met Ser Met Met 245 250 255 Gly Glu Leu Asn TyrPhe Leu Gly Phe Gln Val Lys Gln Leu Lys Asp 260 265 270 Gly Thr Phe IleSer Gln Thr Lys Tyr Thr Gln Asp Leu Leu Lys Arg 275 280 285 Phe Gly MetLys Asp Ala Lys Pro Ala Lys Thr Pro Met Gly Thr Asp 290 295 300 Gly HisThr Asp Leu Asn Lys Gly Gly Lys Ser Val Asp Gln Lys Ala 305 310 315 320Tyr Arg Ser Met Ile Gly Ser Leu Leu Tyr Leu Cys Ala Ser Arg Pro 325 330335 Asp Ile Met Leu Ser Val Cys Met Cys Ala Arg Phe Gln Ser Asp Pro 340345 350 Lys Glu Cys His Leu Val Ala Val Lys Arg Ile Leu Arg Tyr Leu Val355 360 365 Ala Thr Pro Cys Phe Gly Leu Trp Tyr Pro Lys Gly 370 375 38081 168 PRT Zea mays SITE (1)..(168) nucleotide positions 901 1068 ofOpie 2 nucleotide sequence 81 Leu Trp Tyr Pro Lys Gly Ser Thr Phe AspLeu Val Gly Tyr Ser Asp 1 5 10 15 Ser Asp Tyr Ala Gly Cys Lys Val AspArg Lys Ser Thr Ser Gly Thr 20 25 30 Cys Gln Phe Leu Gly Arg Ser Leu ValSer Trp Asn Ser Lys Lys Gln 35 40 45 Thr Ser Val Ala Leu Ser Thr Ala GluAla Glu Tyr Val Ala Ala Gly 50 55 60 Gln Cys Cys Ala Gln Leu Leu Trp MetArg Gln Thr Leu Arg Asp Phe 65 70 75 80 Gly Tyr Asn Leu Ser Lys Val ProLeu Leu Cys Asp Asn Glu Ser Ala 85 90 95 Ile Arg Met Ala Glu Asn Pro ValGlu His Ser Arg Thr Lys His Ile 100 105 110 Asp Ile Arg His His Phe LeuArg Asp His Gln Gln Lys Gly Asp Ile 115 120 125 Glu Val Phe His Val SerThr Glu Asn Gln Leu Ala Asp Ile Phe Thr 130 135 140 Lys Pro Leu Asp GluLys Thr Phe Cys Arg Leu Arg Ser Glu Leu Asn 145 150 155 160 Val Leu AspSer Arg Asn Leu Asp 165 82 4 PRT Artificial sequence Synthetic peptide82 Lys Lys Gly Lys 1 83 647 PRT Glycine max 83 Thr Leu Ile Ala Arg SerLeu Leu Gly Gln Asn Lys Phe Asp Arg Cys 1 5 10 15 Phe Thr Arg Pro SerThr Phe Leu Ile Gln Thr His Ile Phe Val Val 20 25 30 Ile Ser Phe Ser AlaPhe Pro Asn Ser Ser Gln Arg Phe Thr Lys Pro 35 40 45 Phe Gln Arg Leu CysPhe Ser Met Ala Thr Ser Pro Lys Asp Thr Ser 50 55 60 Ser Pro Gly Ser ProSer Val Pro Ser Ser Pro Ser Ser Thr Lys Ala 65 70 75 80 Pro Ser Asn GlnGlu Gln Pro Glu Phe His Ile Gln Pro Ile Gln Met 85 90 95 Ile Pro Gly GlnAla Pro Val Pro Glu Lys Leu Val Pro Lys Arg Gln 100 105 110 Gln Gly ValLys Ile Ser Glu Asn Pro Ser Ile Ala Thr Ser Pro Arg 115 120 125 Val AspThr Glu Met Asp Lys Lys Ile Arg Ser Ile Val Ser Ser Ile 130 135 140 LeuLys Asn Ala Ser Val Pro Asp Ala Asp Lys Asp Val Pro Thr Ser 145 150 155160 Ser Thr Pro Asn Ala Glu Val Leu Ser Ser Ser Ser Lys Glu Glu Ser 165170 175 Thr Glu Glu Glu Glu Gln Ala Thr Glu Glu Thr Pro Ala Pro Arg Ala180 185 190 Pro Glu Pro Ala Pro Gly Asp Leu Ile Asp Leu Glu Glu Val GluSer 195 200 205 Asp Glu Glu Pro Ile Ala Asn Lys Leu Ala Pro Gly Ile AlaGlu Arg 210 215 220 Leu Gln Ser Arg Lys Gly Lys Thr Pro Ile Thr Arg SerGly Arg Ile 225 230 235 240 Lys Thr Met Ala Gln Lys Lys Ser Thr Pro IleThr Pro Thr Thr Ser 245 250 255 Arg Trp Ser Lys Val Ala Ile Pro Ser LysLys Arg Lys Glu Phe Ser 260 265 270 Ser Ser Asp Ser Asp Asp Asp Val GluLeu Asp Val Pro Asp Ile Lys 275 280 285 Arg Ala Lys Lys Ser Gly Lys LysVal Pro Gly Asn Val Pro Asp Ala 290 295 300 Pro Leu Asp Asn Ile Ser PheHis Ser Ile Gly Asn Val Glu Arg Trp 305 310 315 320 Lys Phe Val Tyr GlnArg Arg Leu Ala Leu Glu Arg Glu Leu Gly Arg 325 330 335 Asp Ala Leu AspCys Lys Glu Ile Met Asp Leu Ile Lys Ala Ala Gly 340 345 350 Leu Leu LysThr Val Thr Lys Leu Gly Asp Cys Tyr Glu Ser Leu Val 355 360 365 Arg GluPhe Ile Val Asn Ile Pro Ser Asp Ile Thr Asn Arg Lys Ser 370 375 380 AspGlu Tyr Gln Lys Val Phe Val Arg Gly Lys Cys Val Arg Phe Ser 385 390 395400 Pro Ala Val Ile Asn Lys Tyr Leu Gly Arg Pro Thr Glu Gly Val Val 405410 415 Asp Ile Ala Val Ser Glu His Gln Ile Ala Lys Glu Ile Thr Ala Lys420 425 430 Gln Val Gln His Trp Pro Lys Lys Gly Lys Leu Ser Ala Gly LysLeu 435 440 445 Ser Val Lys Tyr Ala Ile Leu His Arg Ile Gly Ala Ala AsnTrp Val 450 455 460 Pro Thr Asn His Thr Ser Thr Val Ala Thr Gly Leu GlyLys Phe Leu 465 470 475 480 Tyr Ala Val Gly Thr Lys Ser Lys Phe Asn PheGly Lys Tyr Ile Phe 485 490 495 Asp Gln Thr Val Lys His Ser Glu Ser PheAla Val Lys Leu Pro Ile 500 505 510 Ala Phe Pro Thr Val Leu Cys Gly IleMet Leu Ser Gln His Pro Asn 515 520 525 Ile Leu Asn Asn Ile Asp Ser ValMet Lys Arg Glu Ser Ala Leu Ser 530 535 540 Leu His Tyr Lys Leu Phe GluGly Thr His Val Pro Asp Ile Val Ser 545 550 555 560 Thr Ser Gly Lys AlaAla Ala Ser Gly Ala Val Ser Lys Asp Ala Leu 565 570 575 Ile Ala Glu LeuLys Asp Thr Cys Lys Val Leu Glu Ala Thr Ile Lys 580 585 590 Ala Thr ThrGlu Lys Lys Met Glu Leu Glu Arg Leu Ile Lys Arg Leu 595 600 605 Ser AspSer Gly Ile Asp Asp Gly Glu Ala Ala Glu Glu Glu Glu Glu 610 615 620 AlaAla Glu Glu Glu Lys Asp Ala Ala Glu Asp Thr Glu Ser Asp Asp 625 630 635640 Asp Asp Ser Asp Ala Thr Pro 645 84 578 PRT Glycine max 84 Thr LeuIle Ala Arg Ser Leu Leu Gly Gln Asn Lys Phe Asp Arg Cys 1 5 10 15 PheThr Arg Pro Ser Thr Phe Leu Ile Gln Thr His Ile Phe Val Val 20 25 30 IleSer Phe Ser Ala Phe Pro Asn Ser Ser Gln Arg Phe Thr Lys Pro 35 40 45 PheGln Arg Leu Cys Phe Ser Met Ala Thr Ser Pro Lys Asp Thr Ser 50 55 60 SerPro Gly Ser Pro Ser Val Pro Ser Ser Pro Ser Ser Thr Lys Ala 65 70 75 80Pro Ser Asn Gln Glu Gln Pro Glu Phe His Ile Gln Pro Ile Gln Met 85 90 95Ile Pro Gly Leu Ala Pro Val Pro Glu Lys Leu Val Pro Ile Arg Gln 100 105110 Gln Gly Val Lys Ile Ser Glu Asn Pro Ser Ile Ala Thr Ser Pro Arg 115120 125 Glu Leu Thr Arg Glu Met Asp Lys Lys Ile Arg Ser Ile Val Ser Ser130 135 140 Ile Leu Lys Asn Ala Ser Val Pro Asp Ala Asp Lys Asp Val ProThr 145 150 155 160 Ser Ser Thr Pro Asn Ala Glu Val Leu Ser Ser Ser SerLys Glu Glu 165 170 175 Ser Thr Glu Glu Glu Glu Gln Ala Thr Glu Glu ThrPro Ala Pro Arg 180 185 190 Ala Pro Glu Pro Ala Pro Gly Asp Leu Ile AspLeu Glu Glu Val Glu 195 200 205 Ser Asp Glu Glu Pro Ile Ala Asn Lys LeuAla Pro Gly Ile Ala Glu 210 215 220 Arg Leu Gln Ser Arg Lys Gly Lys ThrPro Ile Thr Arg Ser Gly Arg 225 230 235 240 Ile Lys Thr Met Ala Gln LysLys Ser Thr Pro Ile Thr Pro Thr Thr 245 250 255 Ser Arg Trp Ser Lys ValAla Ile Pro Ser Lys Lys Arg Lys Glu Phe 260 265 270 Ser Ser Ser Asp SerAsp Asp Asp Val Glu Leu Asp Val Pro Asp Ile 275 280 285 Lys Arg Ala LysLys Ser Gly Lys Lys Val Pro Gly Asn Val Pro Asp 290 295 300 Ala Pro LeuAsp Asn Ile Ser Phe His Ser Ile Gly Asn Val Glu Arg 305 310 315 320 TrpLys Phe Val Tyr Gln Arg Arg Leu Ala Leu Glu Arg Glu Leu Gly 325 330 335Arg Asp Ala Leu Asp Cys Lys Glu Ile Met Asp Leu Ile Lys Gly Cys 340 345350 Trp Thr Ala Glu Asn Ser His Gln Val Gly Arg Cys Tyr Glu Ser Leu 355360 365 Val Arg Glu Phe Ile Val Asn Ile Pro Ser Asp Ile Thr Asn Arg Lys370 375 380 Ser Asp Glu Tyr Gln Lys Val Phe Val Arg Gly Lys Cys Val ArgPhe 385 390 395 400 Ser Pro Ala Val Ile Asn Lys Tyr Leu Gly Arg Pro ThrGlu Gly Val 405 410 415 Val Asp Ile Ala Val Ser Glu His Gln Ile Ala LysGlu Ile Thr Ala 420 425 430 Gln Val Gln His Trp Pro Lys Lys Gly Lys LeuSer Ala Gly Lys Leu 435 440 445 Ser Val Lys Tyr Ala Ile Leu His Arg IleGly Ala Ala Asn Trp Val 450 455 460 Pro Thr Asn His Thr Ser Thr Val AlaThr Gly Leu Gly Lys Phe Leu 465 470 475 480 Tyr Ala Val Gly Thr Lys SerLys Phe Asn Phe Gly Lys Tyr Ile Phe 485 490 495 Asp Gln Thr Val Lys HisSer Glu Ser Phe Ala Val Lys Leu Pro Ile 500 505 510 Ala Phe Pro Pro ValLeu Cys Gly Ile Met Leu Thr Gln His Pro Asn 515 520 525 Ile Leu Asn AsnIle Asp Ser Val Met Lys Lys Glu Ser Ala Leu Ser 530 535 540 Leu His TyrLys Leu Phe Glu Gly Thr His Val Pro Asp Ile Val Ser 545 550 555 560 ThrSer Gly Lys Ala Ala Ala Ser Gly Ala Val Ser Lys Gly Cys Phe 565 570 575Asp Cys 85 8 PRT Glycine max 85 Gln Leu Leu Leu Ser Glu Arg Ala 1 5 8634 PRT Glycine max 86 Thr Gln Gly His Met Gln Gly Ala Gly Ser Asn HisGln Ser His His 1 5 10 15 Arg Lys Lys Asn Gly Ala Gly Thr Pro Asp GlnLys Thr Leu Arg Gln 20 25 30 Trp His 87 9072 DNA Glycine maxmisc_feature SIRE1 7 87 accaaattat aactttgtct tttttcaaag tggttacattagaccattcg ttattactgt 60 tagtgcttag cactactgag tttaaaaagg ttggctaagattttgttaaa acataagcac 120 ttagacaatg aaggaaagct ggagttgctg cacatgatgtccaacgttat gtcaaggaat 180 aagatcgggc tgcataatgc acaaggcaag ataaagtgtcaagtgatgaa ttgaagttga 240 aggatccacg atgtcggata caatgtcctg acatcctgctcgagaatact ggaagtgctg 300 tacaatgcaa gataaaagtc aagtgaagca ttgaagctgcaggatccaag atgtcggata 360 cgatgtcctg acatctggcc cgataatact ggacatataaatctgttata tctttaacag 420 attattgtgc agttagcaag agattagaag atctatctttaggaacgaat taaaagatca 480 ttaaagttcg aatttcaaag tagaagagtt cgttcagggattaaagatta aagattaaag 540 attcaaacta aaagatcaaa agttatcttt tagttctttaactgcagatt tttcagaaga 600 agatagatct cctccagcat caagaacttg cagcccagaatcgtacacgg ctatataatc 660 atggaggctg cacgagttct gtaccaagtc cgggattgaagagttaattt gtgagttttt 720 gggacttgag tcttttgtga gccaccttga tggtacccttacatcaagtg ttggacctat 780 gtgtgtagag ttgatctctt gtgtctagag ttgatctctattgtgtaggg ttgatccctt 840 ttgtacagag ttgatctctg atgtgtcttt gaattaattgtaaacacgag agtgtgagtg 900 agagggagtg agcagaggtt ctcatatcta agattgggtcttaggtagag atcgcacggg 960 tagtggttag gtgagaaggt tgtaaacagg ggttgttagaccttgaacta acactattga 1020 gagtggattt cctccctggc ttggtagccc ccagatgtaggtgaggttgc accgaactgg 1080 gtaaacaatt ctcttgtgtt atttacttgt ttaatctgttcatacggaca cacataaact 1140 gcatgttctg aagcatgatg tcgtgacatc ctgtacgacatctgtcccct ggtatcagaa 1200 tttcaattgg tatcagagcc aacactcgaa atcacagagtgagatctggg gagataaatt 1260 ctg atg aac atg gag aaa gaa gga gga cca gtgaac aga cca cca att 1308 Met Asn Met Glu Lys Glu Gly Gly Pro Val Asn ArgPro Pro Ile 1 5 10 15 ctt gat gga agc aac tat gaa tac tgg aaa gca agaatg gtg gcc ttc 1356 Leu Asp Gly Ser Asn Tyr Glu Tyr Trp Lys Ala Arg MetVal Ala Phe 20 25 30 ctc aaa tca ctg gat agc aga acc tgg aaa gct gtc atcaaa ggc tgg 1404 Leu Lys Ser Leu Asp Ser Arg Thr Trp Lys Ala Val Ile LysGly Trp 35 40 45 gaa cat ccc aag atg ctg gac aca gaa gga aag ccc act gatgaa ttg 1452 Glu His Pro Lys Met Leu Asp Thr Glu Gly Lys Pro Thr Asp GluLeu 50 55 60 aag cca gaa gaa gac tgg act aaa gaa gag gac gaa ttg gca cttgga 1500 Lys Pro Glu Glu Asp Trp Thr Lys Glu Glu Asp Glu Leu Ala Leu Gly65 70 75 aac tcc aaa gct ttg aat gca cta ttc aat gga gtt gac aag aac atc1548 Asn Ser Lys Ala Leu Asn Ala Leu Phe Asn Gly Val Asp Lys Asn Ile 8085 90 95 ttc aga ctg atc aac act tgc aca gtg gcc aaa gat gca tgc gag atc1596 Phe Arg Leu Ile Asn Thr Cys Thr Val Ala Lys Asp Ala Cys Glu Ile 100105 110 ctg aaa agc act cat gaa gga acc tcc aaa gtg aag atg tcc aga ttg1644 Leu Lys Ser Thr His Glu Gly Thr Ser Lys Val Lys Met Ser Arg Leu 115120 125 caa ctc ttg gct aca aaa ttc gaa aat ctg aag atg aag gag gaa gag1692 Gln Leu Leu Ala Thr Lys Phe Glu Asn Leu Lys Met Lys Glu Glu Glu 130135 140 tgt att cat gac ttc cac atg aac att ctt gaa att gcc aat gct tgc1740 Cys Ile His Asp Phe His Met Asn Ile Leu Glu Ile Ala Asn Ala Cys 145150 155 act gcc ttg gga gag agg ata aca gat gaa aag ctg gtg aga aag atc1788 Thr Ala Leu Gly Glu Arg Ile Thr Asp Glu Lys Leu Val Arg Lys Ile 160165 170 175 ctc aga tcc ttg cct aag aga ttt gac atg aaa gtc act gca atagag 1836 Leu Arg Ser Leu Pro Lys Arg Phe Asp Met Lys Val Thr Ala Ile Glu180 185 190 gag gcc caa gac att tgc aac atg aga gtt gat gaa ctc att ggttct 1884 Glu Ala Gln Asp Ile Cys Asn Met Arg Val Asp Glu Leu Ile Gly Ser195 200 205 ctt caa acc ttt gag cta gga ctc tcg gat agg gct gaa aag aagagc 1932 Leu Gln Thr Phe Glu Leu Gly Leu Ser Asp Arg Ala Glu Lys Lys Ser210 215 220 aag aat cta gct ttc gtg tcc aat gat gaa gga gaa gaa gat gagtat 1980 Lys Asn Leu Ala Phe Val Ser Asn Asp Glu Gly Glu Glu Asp Glu Tyr225 230 235 gac ctg gat act gat gaa ggt ctg aca aat gca gtt gtg ctc cttgga 2028 Asp Leu Asp Thr Asp Glu Gly Leu Thr Asn Ala Val Val Leu Leu Gly240 245 250 255 aag cag ttc aac aaa gtg ctg aac aga atg gac aag agg cagaaa cca 2076 Lys Gln Phe Asn Lys Val Leu Asn Arg Met Asp Lys Arg Gln LysPro 260 265 270 cat gtc cag aac atc cct ttc gac atc agg aaa ggc agt aaatac cag 2124 His Val Gln Asn Ile Pro Phe Asp Ile Arg Lys Gly Ser Lys TyrGln 275 280 285 aaa aga tca gat gta aag ccc agt cac agc aaa gga att caatgc cat 2172 Lys Arg Ser Asp Val Lys Pro Ser His Ser Lys Gly Ile Gln CysHis 290 295 300 ggg tgt gaa ggc tat gga cac atc ata gct gaa tgt ccc actcat ctc 2220 Gly Cys Glu Gly Tyr Gly His Ile Ile Ala Glu Cys Pro Thr HisLeu 305 310 315 aag aag cac agg aaa gga ctc tct gta tgt caa tct gat acagag agt 2268 Lys Lys His Arg Lys Gly Leu Ser Val Cys Gln Ser Asp Thr GluSer 320 325 330 335 gaa caa gaa agt gat tct gac aga gat gtg aat gca ctcatt ggg ata 2316 Glu Gln Glu Ser Asp Ser Asp Arg Asp Val Asn Ala Leu IleGly Ile 340 345 350 ttt gaa act gct gaa gat tca agt gat aca gac agt gaaatc act ttt 2364 Phe Glu Thr Ala Glu Asp Ser Ser Asp Thr Asp Ser Glu IleThr Phe 355 360 365 gat gag ctt gct gca tcc tat aga aaa cta tgc atc aaaagt gag aag 2412 Asp Glu Leu Ala Ala Ser Tyr Arg Lys Leu Cys Ile Lys SerGlu Lys 370 375 380 atc ctt cag caa gaa gca caa ctg aag aag gtc att gcagat ctg gaa 2460 Ile Leu Gln Gln Glu Ala Gln Leu Lys Lys Val Ile Ala AspLeu Glu 385 390 395 gct gag aag gag gca cat aaa gag gag atc tct gag cttaaa ggt gaa 2508 Ala Glu Lys Glu Ala His Lys Glu Glu Ile Ser Glu Leu LysGly Glu 400 405 410 415 gtc ggt ttt ctg aac tct aag ctg gaa aac atg acaaaa tca ata aag 2556 Val Gly Phe Leu Asn Ser Lys Leu Glu Asn Met Thr LysSer Ile Lys 420 425 430 atg ctg aac aaa ggc tca gat aca ctt gat gag gtgctg ctg ctt gga 2604 Met Leu Asn Lys Gly Ser Asp Thr Leu Asp Glu Val LeuLeu Leu Gly 435 440 445 aag aat gct gga aac cag aga gga ctt gga ttt aatcct aag tct gct 2652 Lys Asn Ala Gly Asn Gln Arg Gly Leu Gly Phe Asn ProLys Ser Ala 450 455 460 ggc aga aca acc atg aca gaa ttt gtt cct gcc aaaaac agg act gga 2700 Gly Arg Thr Thr Met Thr Glu Phe Val Pro Ala Lys AsnArg Thr Gly 465 470 475 gcc acg atg tca caa cat cgg tct cga cat cat ggaatg cag cag aaa 2748 Ala Thr Met Ser Gln His Arg Ser Arg His His Gly MetGln Gln Lys 480 485 490 495 aag agc aaa aga aag aag tgg agg tgt cac tactgt ggc aag tat ggt 2796 Lys Ser Lys Arg Lys Lys Trp Arg Cys His Tyr CysGly Lys Tyr Gly 500 505 510 cac ata aag ccc ttt tgc tat cat cta cat ggccat cca cat cat gga 2844 His Ile Lys Pro Phe Cys Tyr His Leu His Gly HisPro His His Gly 515 520 525 act caa agc agc aac agc aga aag aag atg atgtgg gtt cca aaa cac 2892 Thr Gln Ser Ser Asn Ser Arg Lys Lys Met Met TrpVal Pro Lys His 530 535 540 aag gct gtc agt ctt gtt gtt cat act tca cttaga gca tca gct aag 2940 Lys Ala Val Ser Leu Val Val His Thr Ser Leu ArgAla Ser Ala Lys 545 550 555 gaa gat tgg tac cta gat agc ggc tgt tcc agacac atg aca gga gtc 2988 Glu Asp Trp Tyr Leu Asp Ser Gly Cys Ser Arg HisMet Thr Gly Val 560 565 570 575 aaa gaa ttc ctg ctg aac att gag ccc tgctcc act agt tat gtg aca 3036 Lys Glu Phe Leu Leu Asn Ile Glu Pro Cys SerThr Ser Tyr Val Thr 580 585 590 ttt gga gat ggc tct aaa gga aag atc attgga atg gga aag cta gtt 3084 Phe Gly Asp Gly Ser Lys Gly Lys Ile Ile GlyMet Gly Lys Leu Val 595 600 605 cat gat gga ctt cct agt ctg aac aaa gtactg ctg gtg aag gga ctg 3132 His Asp Gly Leu Pro Ser Leu Asn Lys Val LeuLeu Val Lys Gly Leu 610 615 620 act gca aac ttg att agc atc agt cag ctgtgt gat gaa gga ttc aat 3180 Thr Ala Asn Leu Ile Ser Ile Ser Gln Leu CysAsp Glu Gly Phe Asn 625 630 635 gta aac ttc aca aag tca gaa tgc ttg gtgaca aat gag aag agt gaa 3228 Val Asn Phe Thr Lys Ser Glu Cys Leu Val ThrAsn Glu Lys Ser Glu 640 645 650 655 gtt cta atg aag ggc agc aga tca aaggac aat tgt tac cta tgg aca 3276 Val Leu Met Lys Gly Ser Arg Ser Lys AspAsn Cys Tyr Leu Trp Thr 660 665 670 ccc caa gaa acc agc tac tcc tct acatgt cta tcc tcc aaa gaa gat 3324 Pro Gln Glu Thr Ser Tyr Ser Ser Thr CysLeu Ser Ser Lys Glu Asp 675 680 685 gaa gtc aga ata tgg cat caa agg tttgga cat ctg cac tta aga ggc 3372 Glu Val Arg Ile Trp His Gln Arg Phe GlyHis Leu His Leu Arg Gly 690 695 700 atg aag aaa atc ctt gac aaa agt gctgtt aga ggc att ccc aat ctg 3420 Met Lys Lys Ile Leu Asp Lys Ser Ala ValArg Gly Ile Pro Asn Leu 705 710 715 aaa ata gaa gaa ggc aga atc tgt ggtgaa tgt cag att gga aag caa 3468 Lys Ile Glu Glu Gly Arg Ile Cys Gly GluCys Gln Ile Gly Lys Gln 720 725 730 735 gtc aag atg tcc cac cag aag cttcaa cat cag acc act tcc agg gtg 3516 Val Lys Met Ser His Gln Lys Leu GlnHis Gln Thr Thr Ser Arg Val 740 745 750 ctg gaa cta ctt cac atg gat ttgatg ggg cct atg caa gtt gaa agc 3564 Leu Glu Leu Leu His Met Asp Leu MetGly Pro Met Gln Val Glu Ser 755 760 765 ctt gga gga aag agg tat gcc tatgtt gtt gtg gat gat ttc tcc aga 3612 Leu Gly Gly Lys Arg Tyr Ala Tyr ValVal Val Asp Asp Phe Ser Arg 770 775 780 ttt acc tgg gta aac ttt atc agagag aaa tca gga acc ttt gaa gta 3660 Phe Thr Trp Val Asn Phe Ile Arg GluLys Ser Gly Thr Phe Glu Val 785 790 795 ttc aag aag ttg agt cta aga cttcaa aga gag aaa gac tgt gtc atc 3708 Phe Lys Lys Leu Ser Leu Arg Leu GlnArg Glu Lys Asp Cys Val Ile 800 805 810 815 aag aga atc agg agt gac catggc aga gaa ttt gaa aac agc agg ttc 3756 Lys Arg Ile Arg Ser Asp His GlyArg Glu Phe Glu Asn Ser Arg Phe 820 825 830 act gaa ttc tgc aca tct gaaggc atc act cat gag ttc tct gca gcc 3804 Thr Glu Phe Cys Thr Ser Glu GlyIle Thr His Glu Phe Ser Ala Ala 835 840 845 att aca cca caa cag aat gggata gtt gag agg aaa aac agg acc ttg 3852 Ile Thr Pro Gln Gln Asn Gly IleVal Glu Arg Lys Asn Arg Thr Leu 850 855 860 caa gag gct gct cgg gtc atgctt cat gcc aaa gaa ctt ccc tat aat 3900 Gln Glu Ala Ala Arg Val Met LeuHis Ala Lys Glu Leu Pro Tyr Asn 865 870 875 ctc tgg gct gaa gcc atg aacaca gca tgt tac atc cac aac aga gtc 3948 Leu Trp Ala Glu Ala Met Asn ThrAla Cys Tyr Ile His Asn Arg Val 880 885 890 895 aca ctg aga aga ggg actcca acc acc ctg tat gaa atc tgg aaa ggg 3996 Thr Leu Arg Arg Gly Thr ProThr Thr Leu Tyr Glu Ile Trp Lys Gly 900 905 910 agg aag cca tct gtc aagcac ttc cac atc ttt gga agt cca tgt tac 4044 Arg Lys Pro Ser Val Lys HisPhe His Ile Phe Gly Ser Pro Cys Tyr 915 920 925 atc ttg gca gat aga gagcaa agg aga aag atg gat ccc aag agt gat 4092 Ile Leu Ala Asp Arg Glu GlnArg Arg Lys Met Asp Pro Lys Ser Asp 930 935 940 gca gga ata ttc ctg ggatac tct aca aac agc aga gca tat aga gta 4140 Ala Gly Ile Phe Leu Gly TyrSer Thr Asn Ser Arg Ala Tyr Arg Val 945 950 955 ttc aat tcc aga acc agaaca gtg atg gaa tcc atc aat gtg gtt gtt 4188 Phe Asn Ser Arg Thr Arg ThrVal Met Glu Ser Ile Asn Val Val Val 960 965 970 975 gat gat ctg tct ccagca aga aag aag gat gtc gaa gaa gat gtc aga 4236 Asp Asp Leu Ser Pro AlaArg Lys Lys Asp Val Glu Glu Asp Val Arg 980 985 990 aca tcg gga gac aatgta gca gat gca gct aaa agt gga gaa aat gca 4284 Thr Ser Gly Asp Asn ValAla Asp Ala Ala Lys Ser Gly Glu Asn Ala 995 1000 1005 gaa aac tct gattct gct aca gat gaa tca aac atc aac caa cct 4329 Glu Asn Ser Asp Ser AlaThr Asp Glu Ser Asn Ile Asn Gln Pro 1010 1015 1020 gac aag aga tcc tccact aga atc cag aag atg cac ccc aag gag 4374 Asp Lys Arg Ser Ser Thr ArgIle Gln Lys Met His Pro Lys Glu 1025 1030 1035 ctg att ata gga gat ccaaac aga ggg gtc act aca aga tca agg 4419 Leu Ile Ile Gly Asp Pro Asn ArgGly Val Thr Thr Arg Ser Arg 1040 1045 1050 gag gtt gag atc gtc tca aactca tgt ttt gtc tcc aaa att gag 4464 Glu Val Glu Ile Val Ser Asn Ser CysPhe Val Ser Lys Ile Glu 1055 1060 1065 ccc aag aac gtg aaa gag gca ctgaca gat gag ttc tgg atc aat 4509 Pro Lys Asn Val Lys Glu Ala Leu Thr AspGlu Phe Trp Ile Asn 1070 1075 1080 gct atg caa gaa gaa ttg gag caa ttcaaa agg aat gaa gtc tgg 4554 Ala Met Gln Glu Glu Leu Glu Gln Phe Lys ArgAsn Glu Val Trp 1085 1090 1095 gag cta gtt cct agg cct gag gga act aatgtg att ggc acc aag 4599 Glu Leu Val Pro Arg Pro Glu Gly Thr Asn Val IleGly Thr Lys 1100 1105 1110 tgg atc ttc aag aac aaa acc aat gaa gaa ggtgtc ata acc aga 4644 Trp Ile Phe Lys Asn Lys Thr Asn Glu Glu Gly Val IleThr Arg 1115 1120 1125 aac aag gcc aga ctg gtt gct caa ggc tac act cagatt gaa ggt 4689 Asn Lys Ala Arg Leu Val Ala Gln Gly Tyr Thr Gln Ile GluGly 1130 1135 1140 gta gac ttt gat gag act ttt gcc cca gtt gct aga cttgag tcc 4734 Val Asp Phe Asp Glu Thr Phe Ala Pro Val Ala Arg Leu Glu Ser1145 1150 1155 atc aga tta tta ctt ggt gta gct tgc atc ctc aaa ttc aagctg 4779 Ile Arg Leu Leu Leu Gly Val Ala Cys Ile Leu Lys Phe Lys Leu1160 1165 1170 tac cag atg gat gtg aaa agc gca ttt ctg aat gga tac ctgaat 4824 Tyr Gln Met Asp Val Lys Ser Ala Phe Leu Asn Gly Tyr Leu Asn1175 1180 1185 gaa gaa gtc tat gtg gag cag cca aag gga ttt gca gac ccgact 4869 Glu Glu Val Tyr Val Glu Gln Pro Lys Gly Phe Ala Asp Pro Thr1190 1195 1200 cat cca gat cat gta tac agg ctc aag aag gct ctc tat ggattg 4914 His Pro Asp His Val Tyr Arg Leu Lys Lys Ala Leu Tyr Gly Leu1205 1210 1215 aag caa gct cca aga gct tgg tat gaa agg cta aca gag ttcctt 4959 Lys Gln Ala Pro Arg Ala Trp Tyr Glu Arg Leu Thr Glu Phe Leu1220 1225 1230 act cag caa ggg tat agg aag gga gga att gac aag act ctcttt 5004 Thr Gln Gln Gly Tyr Arg Lys Gly Gly Ile Asp Lys Thr Leu Phe1235 1240 1245 gtc aag caa gat gct gaa aac ttg atg att gca cag ata tatgtt 5049 Val Lys Gln Asp Ala Glu Asn Leu Met Ile Ala Gln Ile Tyr Val1250 1255 1260 gat gac att gtg ttt gga ggg atg tcg aat gag atg ctt cgacat 5094 Asp Asp Ile Val Phe Gly Gly Met Ser Asn Glu Met Leu Arg His1265 1270 1275 ttt gtt caa cag atg caa tct gaa ttt gag atg agt ctt gttgga 5139 Phe Val Gln Gln Met Gln Ser Glu Phe Glu Met Ser Leu Val Gly1280 1285 1290 gag ctg act tat ttt ctg gga ctt caa gtg aag cag atg gaggac 5184 Glu Leu Thr Tyr Phe Leu Gly Leu Gln Val Lys Gln Met Glu Asp1295 1300 1305 tcc ata ttc ctc tca caa agc agg tat gca aag aac att gtcaag 5229 Ser Ile Phe Leu Ser Gln Ser Arg Tyr Ala Lys Asn Ile Val Lys1310 1315 1320 aag ttt ggg atg gag aat gcc agc cat aaa aga aca cct gcacct 5274 Lys Phe Gly Met Glu Asn Ala Ser His Lys Arg Thr Pro Ala Pro1325 1330 1335 act cac ttg aag ctg tca aag gat gaa gct ggc acc agt gttgat 5319 Thr His Leu Lys Leu Ser Lys Asp Glu Ala Gly Thr Ser Val Asp1340 1345 1350 caa aag cct tac aga agc atg ata ggg agc tta cta tat ttaaca 5364 Gln Lys Pro Tyr Arg Ser Met Ile Gly Ser Leu Leu Tyr Leu Thr1355 1360 1365 gct agc aga ccc gac atc acc tat gca gtg ggt gtt tgt gcaaga 5409 Ala Ser Arg Pro Asp Ile Thr Tyr Ala Val Gly Val Cys Ala Arg1370 1375 1380 tat caa gcc aat ccc aag ata agt cac ttg aat caa gta aagaga 5454 Tyr Gln Ala Asn Pro Lys Ile Ser His Leu Asn Gln Val Lys Arg1385 1390 1395 att ctg aaa tat gta aat ggc act agt gac tat ggg att atgtac 5499 Ile Leu Lys Tyr Val Asn Gly Thr Ser Asp Tyr Gly Ile Met Tyr1400 1405 1410 tgt cat tgt tca agt tca atg ctg gtt ggg tat tgt gat gctgat 5544 Cys His Cys Ser Ser Ser Met Leu Val Gly Tyr Cys Asp Ala Asp1415 1420 1425 tgg gct ggg agt gca gat gac aga aaa agc act tct ggt ggatgc 5589 Trp Ala Gly Ser Ala Asp Asp Arg Lys Ser Thr Ser Gly Gly Cys1430 1435 1440 ttc tat ttg gga aac aat ctt att tca tgg ttc agc aag aagcag 5634 Phe Tyr Leu Gly Asn Asn Leu Ile Ser Trp Phe Ser Lys Lys Gln1445 1450 1455 aac tgt gtg tcc cta tct aca gca gaa gcc gag tat att gcagca 5679 Asn Cys Val Ser Leu Ser Thr Ala Glu Ala Glu Tyr Ile Ala Ala1460 1465 1470 gga agc agc tgt tca cag cta gtt tgg atg aag cag atg ctgaag 5724 Gly Ser Ser Cys Ser Gln Leu Val Trp Met Lys Gln Met Leu Lys1475 1480 1485 gag tac aat gtc gaa caa gat gtc atg aca ttg tac tgt gacaac 5769 Glu Tyr Asn Val Glu Gln Asp Val Met Thr Leu Tyr Cys Asp Asn1490 1495 1500 atg agt gct att aat att tct aaa aat cct gtt caa cac agcaga 5814 Met Ser Ala Ile Asn Ile Ser Lys Asn Pro Val Gln His Ser Arg1505 1510 1515 acc aag cac att gac att aga cat cac tat atc aga gat cttgtt 5859 Thr Lys His Ile Asp Ile Arg His His Tyr Ile Arg Asp Leu Val1520 1525 1530 gat gat aaa gtg atc aca ctg aag cat gtt gac act gag gaacaa 5904 Asp Asp Lys Val Ile Thr Leu Lys His Val Asp Thr Glu Glu Gln1535 1540 1545 ata gca gat att ttc aca aag gct ttg gat gca aat cag tttgaa 5949 Ile Ala Asp Ile Phe Thr Lys Ala Leu Asp Ala Asn Gln Phe Glu1550 1555 1560 aaa ctg agg ggc aag ctg ggc att tgt gtg cta gag gaa ttatag 5994 Lys Leu Arg Gly Lys Leu Gly Ile Cys Val Leu Glu Glu Leu 15651570 1575 caa cta cag caa tct gaa cgt gcc caa acg aat cac tta aca tta6039 Gln Leu Gln Gln Ser Glu Arg Ala Gln Thr Asn His Leu Thr Leu 15801585 1590 ata gca cgt tca cta ctg aac caa gga aaa ttc gac cgt tgc ttc6084 Ile Ala Arg Ser Leu Leu Asn Gln Gly Lys Phe Asp Arg Cys Phe 15951600 1605 aca cga ccc tct aca ttc ctc att caa atc tat atc tgc ttg gca6129 Thr Arg Pro Ser Thr Phe Leu Ile Gln Ile Tyr Ile Cys Leu Ala 16101615 1620 ttc gtg ttt tta cca gca ttt ccc aat agc ctt ctg aga ttt acg6174 Phe Val Phe Leu Pro Ala Phe Pro Asn Ser Leu Leu Arg Phe Thr 16251630 1635 aaa tca ttc caa acg ctc tgc ttt tcc atg gct acc tca tca aaa6219 Lys Ser Phe Gln Thr Leu Cys Phe Ser Met Ala Thr Ser Ser Lys 16401645 1650 gaa act gca gct tct ggt tca cca tct gtc cca tca tct tca cac6264 Glu Thr Ala Ala Ser Gly Ser Pro Ser Val Pro Ser Ser Ser His 16551660 1665 cag gaa caa cct gaa ctc aac atc caa ccc atc caa ata att cct6309 Gln Glu Gln Pro Glu Leu Asn Ile Gln Pro Ile Gln Ile Ile Pro 16701675 1680 ggt caa gcc tct gtc cct gag aaa ctg gtt ccc aga aga cca cag6354 Gly Gln Ala Ser Val Pro Glu Lys Leu Val Pro Arg Arg Pro Gln 16851690 1695 gga gtg aag att gct gaa aac cct agc cct gca acg agt cct agg6399 Gly Val Lys Ile Ala Glu Asn Pro Ser Pro Ala Thr Ser Pro Arg 17001705 1710 gaa gta gac acg gag atg gac aag aaa ata cgc agc att gtg agt6444 Glu Val Asp Thr Glu Met Asp Lys Lys Ile Arg Ser Ile Val Ser 17151720 1725 agc atc ttg aaa gac gcc tct gtt cct gaa gct gat gaa gat gtc6489 Ser Ile Leu Lys Asp Ala Ser Val Pro Glu Ala Asp Glu Asp Val 17301735 1740 cca aca tcg tcc aac cca aat gtt tct gtg cct gat gtc aag aaa6534 Pro Thr Ser Ser Asn Pro Asn Val Ser Val Pro Asp Val Lys Lys 17451750 1755 gat gtt cca aca tct tcc gct cca aat gct gaa gca ctc cct tca6579 Asp Val Pro Thr Ser Ser Ala Pro Asn Ala Glu Ala Leu Pro Ser 17601765 1770 ccc ggt gaa gag gga tca act gag gaa gat gat caa gcc gca gag6624 Pro Gly Glu Glu Gly Ser Thr Glu Glu Asp Asp Gln Ala Ala Glu 17751780 1785 gag act cct gca cca cgg gca cca gaa cct gct cca ggt gat ctc6669 Glu Thr Pro Ala Pro Arg Ala Pro Glu Pro Ala Pro Gly Asp Leu 17901795 1800 att gac tta gaa gaa gtc gaa tct gat gaa gaa ccc att gcc aac6714 Ile Asp Leu Glu Glu Val Glu Ser Asp Glu Glu Pro Ile Ala Asn 18051810 1815 cgg ttg gca cct ggc att gca gaa agg tta caa agc aga aaa ggg6759 Arg Leu Ala Pro Gly Ile Ala Glu Arg Leu Gln Ser Arg Lys Gly 18201825 1830 aag acc ccc att aag agg tct gga cga atc aaa aca atg gcc cag6804 Lys Thr Pro Ile Lys Arg Ser Gly Arg Ile Lys Thr Met Ala Gln 18351840 1845 aag aag agt act cca atc act cct gcc aca tcc aga aga agc aag6849 Lys Lys Ser Thr Pro Ile Thr Pro Ala Thr Ser Arg Arg Ser Lys 18501855 1860 gtt gct atc ccc tcc aag aag agg aaa gaa att tcg tca tcc gat6894 Val Ala Ile Pro Ser Lys Lys Arg Lys Glu Ile Ser Ser Ser Asp 18651870 1875 tct gat aag gat gtc gaa cta gat gtc tcg aca tct aag aag gcc6939 Ser Asp Lys Asp Val Glu Leu Asp Val Ser Thr Ser Lys Lys Ala 18801885 1890 aag act tca ggg aaa aag gtg cct gga aat gtc cct gat gca cca6984 Lys Thr Ser Gly Lys Lys Val Pro Gly Asn Val Pro Asp Ala Pro 18951900 1905 ttg gac aac atc tct ttc cac tcc att ggc aat gtt gaa aag tgg7029 Leu Asp Asn Ile Ser Phe His Ser Ile Gly Asn Val Glu Lys Trp 19101915 1920 aaa tat gtg tat caa cgc aga ctt gcg gtt gag aga gaa ctg gga7074 Lys Tyr Val Tyr Gln Arg Arg Leu Ala Val Glu Arg Glu Leu Gly 19251930 1935 aga gat gcc ttg gat tgc aag gag atc atg gac ctc atc aag gct7119 Arg Asp Ala Leu Asp Cys Lys Glu Ile Met Asp Leu Ile Lys Ala 19401945 1950 ggt gga ctg ctg aag act gtc agc aag ttg gga gat tgc tat gaa7164 Gly Gly Leu Leu Lys Thr Val Ser Lys Leu Gly Asp Cys Tyr Glu 19551960 1965 ggc tta gtc agg gaa ttc att gtc aac att ccc tct gac ata tct7209 Gly Leu Val Arg Glu Phe Ile Val Asn Ile Pro Ser Asp Ile Ser 19701975 1980 aac aga aaa agt gat gag tat caa aag gtg ttt gtc aga gga aag7254 Asn Arg Lys Ser Asp Glu Tyr Gln Lys Val Phe Val Arg Gly Lys 19851990 1995 tgt gtt aaa ttc tcc cct gct gtg att aac aaa tat ctg ggc aga7299 Cys Val Lys Phe Ser Pro Ala Val Ile Asn Lys Tyr Leu Gly Arg 20002005 2010 cct act gat gga gtg ata gat att gat gtt tct gag cat caa att7344 Pro Thr Asp Gly Val Ile Asp Ile Asp Val Ser Glu His Gln Ile 20152020 2025 gcc aag gaa atc act gcc aaa cga gtc cag cat tgg cca aag aaa7389 Ala Lys Glu Ile Thr Ala Lys Arg Val Gln His Trp Pro Lys Lys 20302035 2040 ggg aag ctt tca gca gga aag cta agt gtg aag tat gcc att ctg7434 Gly Lys Leu Ser Ala Gly Lys Leu Ser Val Lys Tyr Ala Ile Leu 20452050 2055 cac agg att gga gct gca aac tgg gtt ccc acc aat cat act tcc7479 His Arg Ile Gly Ala Ala Asn Trp Val Pro Thr Asn His Thr Ser 20602065 2070 act gtt gcc aca ggt ttg ggt aaa ttt ctg tat gct gtt gga acc7524 Thr Val Ala Thr Gly Leu Gly Lys Phe Leu Tyr Ala Val Gly Thr 20752080 2085 aaa tcc aaa ttt aat ttt gga aac tat atc ttt gat caa act gtt7569 Lys Ser Lys Phe Asn Phe Gly Asn Tyr Ile Phe Asp Gln Thr Val 20902095 2100 aag cat tca gaa tct ttt gct atc aaa tta ccc att gcc ttc cct7614 Lys His Ser Glu Ser Phe Ala Ile Lys Leu Pro Ile Ala Phe Pro 21052110 2115 act gta ttg tgt ggc att atg ttg agt cag cat ccc aat atg tta7659 Thr Val Leu Cys Gly Ile Met Leu Ser Gln His Pro Asn Met Leu 21202125 2130 aac tac act gac tct gtg atg aag aga gaa tct cct cta tcc ctg7704 Asn Tyr Thr Asp Ser Val Met Lys Arg Glu Ser Pro Leu Ser Leu 21352140 2145 cat tac aaa ctg ttt gaa ggg aca cat gtc cca gac att gtc tcg7749 His Tyr Lys Leu Phe Glu Gly Thr His Val Pro Asp Ile Val Ser 21502155 2160 aca tct gtc tcg aca tca ggg aaa gct gct gct tca ggt gct gtg7794 Thr Ser Val Ser Thr Ser Gly Lys Ala Ala Ala Ser Gly Ala Val 21652170 2175 tcc aag gat gct ctg att gct gaa ctc aag gac aca tgc aag gtg7839 Ser Lys Asp Ala Leu Ile Ala Glu Leu Lys Asp Thr Cys Lys Val 21802185 2190 ctg gaa gca acc atc aaa gcc acc aca gag aag aag atg gag cta7884 Leu Glu Ala Thr Ile Lys Ala Thr Thr Glu Lys Lys Met Glu Leu 21952200 2205 gaa ctg ctg atc aaa agg ctc tca gag agt ggc att gat gat gaa7929 Glu Leu Leu Ile Lys Arg Leu Ser Glu Ser Gly Ile Asp Asp Glu 22102215 2220 gaa gca gct gag gaa gaa gga gaa gca gct gaa gaa gaa gaa gaa7974 Glu Ala Ala Glu Glu Glu Gly Glu Ala Ala Glu Glu Glu Glu Glu 22252230 2235 gct gct gag gaa gag gaa gat gca gca gaa gaa aca gaa tca gat8019 Ala Ala Glu Glu Glu Glu Asp Ala Ala Glu Glu Thr Glu Ser Asp 22402245 2250 gat gat tct gaa gcc acc cca tgatcatcag acctttaatt ttgtttttac8070 Asp Asp Ser Glu Ala Thr Pro 2255 ttttattaga tataggggca tgttcctttgaacaattact agttattggt ctgtaatatt 8130 tgcacattaa tttcatgcat cctacttttgccaaatttat gtctaaaaag ggggagtaat 8190 agtattatgc ttgctattat gcatgattttgagtagtagg atactatgta tgatgtatgg 8250 cagtaggaaa cgatgtatgc atgattcatgactttgaggg ggagttgtat gaatatgatc 8310 ttgaggggga gactgctgct gaggatgaatgatgtaagct actagaagat gctgtagtaa 8370 gagcatgaag acagggggag cagatagcggatgtcacatg agatgtctcg acatccttga 8430 aaagactagt agctgataga agatgctgcagtaagcatgg agacaggggg agcagaagca 8490 gaaagctgat gtcacgcgag atgtcttgacatcctggaga agacttgtag attagcaact 8550 tgaagaattt ccgctgtgct tgattactctgaaaatggaa gttgctgatt ccacatgcat 8610 aactgctcgt acctgctcag gaagtgtctaagtatgtttt agacaaaatt tgccaaaggg 8670 ggagattgtt agtgcttagc actactgagtttaaaaaggt tggctaagat tttgttaaaa 8730 cataagcact tagacaatga aggaaagctggagttgctgc acatgatgtc caacgttatg 8790 tcaaggaata agatcgggct gcataatgcacaaggcaaga taaagtgtca agtgatgaat 8850 tgaagttgaa ggatccacga tgtcggatacaatgtcctga catcctgctc gagaatactg 8910 gaagtgctgt acaatgcaag ataaaagtcaagtgaagcat tgaagctgca ggatccaaga 8970 tgtcggatac gatgtcctga catctggcccgataatactg gacatataaa tctgttatat 9030 ctttaacaga ttattgtgca gttagcaagagattagaaga tc 9072 88 1576 PRT Glycine max misc_feature SIRE1 7 88 MetAsn Met Glu Lys Glu Gly Gly Pro Val Asn Arg Pro Pro Ile Leu 1 5 10 15Asp Gly Ser Asn Tyr Glu Tyr Trp Lys Ala Arg Met Val Ala Phe Leu 20 25 30Lys Ser Leu Asp Ser Arg Thr Trp Lys Ala Val Ile Lys Gly Trp Glu 35 40 45His Pro Lys Met Leu Asp Thr Glu Gly Lys Pro Thr Asp Glu Leu Lys 50 55 60Pro Glu Glu Asp Trp Thr Lys Glu Glu Asp Glu Leu Ala Leu Gly Asn 65 70 7580 Ser Lys Ala Leu Asn Ala Leu Phe Asn Gly Val Asp Lys Asn Ile Phe 85 9095 Arg Leu Ile Asn Thr Cys Thr Val Ala Lys Asp Ala Cys Glu Ile Leu 100105 110 Lys Ser Thr His Glu Gly Thr Ser Lys Val Lys Met Ser Arg Leu Gln115 120 125 Leu Leu Ala Thr Lys Phe Glu Asn Leu Lys Met Lys Glu Glu GluCys 130 135 140 Ile His Asp Phe His Met Asn Ile Leu Glu Ile Ala Asn AlaCys Thr 145 150 155 160 Ala Leu Gly Glu Arg Ile Thr Asp Glu Lys Leu ValArg Lys Ile Leu 165 170 175 Arg Ser Leu Pro Lys Arg Phe Asp Met Lys ValThr Ala Ile Glu Glu 180 185 190 Ala Gln Asp Ile Cys Asn Met Arg Val AspGlu Leu Ile Gly Ser Leu 195 200 205 Gln Thr Phe Glu Leu Gly Leu Ser AspArg Ala Glu Lys Lys Ser Lys 210 215 220 Asn Leu Ala Phe Val Ser Asn AspGlu Gly Glu Glu Asp Glu Tyr Asp 225 230 235 240 Leu Asp Thr Asp Glu GlyLeu Thr Asn Ala Val Val Leu Leu Gly Lys 245 250 255 Gln Phe Asn Lys ValLeu Asn Arg Met Asp Lys Arg Gln Lys Pro His 260 265 270 Val Gln Asn IlePro Phe Asp Ile Arg Lys Gly Ser Lys Tyr Gln Lys 275 280 285 Arg Ser AspVal Lys Pro Ser His Ser Lys Gly Ile Gln Cys His Gly 290 295 300 Cys GluGly Tyr Gly His Ile Ile Ala Glu Cys Pro Thr His Leu Lys 305 310 315 320Lys His Arg Lys Gly Leu Ser Val Cys Gln Ser Asp Thr Glu Ser Glu 325 330335 Gln Glu Ser Asp Ser Asp Arg Asp Val Asn Ala Leu Ile Gly Ile Phe 340345 350 Glu Thr Ala Glu Asp Ser Ser Asp Thr Asp Ser Glu Ile Thr Phe Asp355 360 365 Glu Leu Ala Ala Ser Tyr Arg Lys Leu Cys Ile Lys Ser Glu LysIle 370 375 380 Leu Gln Gln Glu Ala Gln Leu Lys Lys Val Ile Ala Asp LeuGlu Ala 385 390 395 400 Glu Lys Glu Ala His Lys Glu Glu Ile Ser Glu LeuLys Gly Glu Val 405 410 415 Gly Phe Leu Asn Ser Lys Leu Glu Asn Met ThrLys Ser Ile Lys Met 420 425 430 Leu Asn Lys Gly Ser Asp Thr Leu Asp GluVal Leu Leu Leu Gly Lys 435 440 445 Asn Ala Gly Asn Gln Arg Gly Leu GlyPhe Asn Pro Lys Ser Ala Gly 450 455 460 Arg Thr Thr Met Thr Glu Phe ValPro Ala Lys Asn Arg Thr Gly Ala 465 470 475 480 Thr Met Ser Gln His ArgSer Arg His His Gly Met Gln Gln Lys Lys 485 490 495 Ser Lys Arg Lys LysTrp Arg Cys His Tyr Cys Gly Lys Tyr Gly His 500 505 510 Ile Lys Pro PheCys Tyr His Leu His Gly His Pro His His Gly Thr 515 520 525 Gln Ser SerAsn Ser Arg Lys Lys Met Met Trp Val Pro Lys His Lys 530 535 540 Ala ValSer Leu Val Val His Thr Ser Leu Arg Ala Ser Ala Lys Glu 545 550 555 560Asp Trp Tyr Leu Asp Ser Gly Cys Ser Arg His Met Thr Gly Val Lys 565 570575 Glu Phe Leu Leu Asn Ile Glu Pro Cys Ser Thr Ser Tyr Val Thr Phe 580585 590 Gly Asp Gly Ser Lys Gly Lys Ile Ile Gly Met Gly Lys Leu Val His595 600 605 Asp Gly Leu Pro Ser Leu Asn Lys Val Leu Leu Val Lys Gly LeuThr 610 615 620 Ala Asn Leu Ile Ser Ile Ser Gln Leu Cys Asp Glu Gly PheAsn Val 625 630 635 640 Asn Phe Thr Lys Ser Glu Cys Leu Val Thr Asn GluLys Ser Glu Val 645 650 655 Leu Met Lys Gly Ser Arg Ser Lys Asp Asn CysTyr Leu Trp Thr Pro 660 665 670 Gln Glu Thr Ser Tyr Ser Ser Thr Cys LeuSer Ser Lys Glu Asp Glu 675 680 685 Val Arg Ile Trp His Gln Arg Phe GlyHis Leu His Leu Arg Gly Met 690 695 700 Lys Lys Ile Leu Asp Lys Ser AlaVal Arg Gly Ile Pro Asn Leu Lys 705 710 715 720 Ile Glu Glu Gly Arg IleCys Gly Glu Cys Gln Ile Gly Lys Gln Val 725 730 735 Lys Met Ser His GlnLys Leu Gln His Gln Thr Thr Ser Arg Val Leu 740 745 750 Glu Leu Leu HisMet Asp Leu Met Gly Pro Met Gln Val Glu Ser Leu 755 760 765 Gly Gly LysArg Tyr Ala Tyr Val Val Val Asp Asp Phe Ser Arg Phe 770 775 780 Thr TrpVal Asn Phe Ile Arg Glu Lys Ser Gly Thr Phe Glu Val Phe 785 790 795 800Lys Lys Leu Ser Leu Arg Leu Gln Arg Glu Lys Asp Cys Val Ile Lys 805 810815 Arg Ile Arg Ser Asp His Gly Arg Glu Phe Glu Asn Ser Arg Phe Thr 820825 830 Glu Phe Cys Thr Ser Glu Gly Ile Thr His Glu Phe Ser Ala Ala Ile835 840 845 Thr Pro Gln Gln Asn Gly Ile Val Glu Arg Lys Asn Arg Thr LeuGln 850 855 860 Glu Ala Ala Arg Val Met Leu His Ala Lys Glu Leu Pro TyrAsn Leu 865 870 875 880 Trp Ala Glu Ala Met Asn Thr Ala Cys Tyr Ile HisAsn Arg Val Thr 885 890 895 Leu Arg Arg Gly Thr Pro Thr Thr Leu Tyr GluIle Trp Lys Gly Arg 900 905 910 Lys Pro Ser Val Lys His Phe His Ile PheGly Ser Pro Cys Tyr Ile 915 920 925 Leu Ala Asp Arg Glu Gln Arg Arg LysMet Asp Pro Lys Ser Asp Ala 930 935 940 Gly Ile Phe Leu Gly Tyr Ser ThrAsn Ser Arg Ala Tyr Arg Val Phe 945 950 955 960 Asn Ser Arg Thr Arg ThrVal Met Glu Ser Ile Asn Val Val Val Asp 965 970 975 Asp Leu Ser Pro AlaArg Lys Lys Asp Val Glu Glu Asp Val Arg Thr 980 985 990 Ser Gly Asp AsnVal Ala Asp Ala Ala Lys Ser Gly Glu Asn Ala Glu 995 1000 1005 Asn SerAsp Ser Ala Thr Asp Glu Ser Asn Ile Asn Gln Pro Asp 1010 1015 1020 LysArg Ser Ser Thr Arg Ile Gln Lys Met His Pro Lys Glu Leu 1025 1030 1035Ile Ile Gly Asp Pro Asn Arg Gly Val Thr Thr Arg Ser Arg Glu 1040 10451050 Val Glu Ile Val Ser Asn Ser Cys Phe Val Ser Lys Ile Glu Pro 10551060 1065 Lys Asn Val Lys Glu Ala Leu Thr Asp Glu Phe Trp Ile Asn Ala1070 1075 1080 Met Gln Glu Glu Leu Glu Gln Phe Lys Arg Asn Glu Val TrpGlu 1085 1090 1095 Leu Val Pro Arg Pro Glu Gly Thr Asn Val Ile Gly ThrLys Trp 1100 1105 1110 Ile Phe Lys Asn Lys Thr Asn Glu Glu Gly Val IleThr Arg Asn 1115 1120 1125 Lys Ala Arg Leu Val Ala Gln Gly Tyr Thr GlnIle Glu Gly Val 1130 1135 1140 Asp Phe Asp Glu Thr Phe Ala Pro Val AlaArg Leu Glu Ser Ile 1145 1150 1155 Arg Leu Leu Leu Gly Val Ala Cys IleLeu Lys Phe Lys Leu Tyr 1160 1165 1170 Gln Met Asp Val Lys Ser Ala PheLeu Asn Gly Tyr Leu Asn Glu 1175 1180 1185 Glu Val Tyr Val Glu Gln ProLys Gly Phe Ala Asp Pro Thr His 1190 1195 1200 Pro Asp His Val Tyr ArgLeu Lys Lys Ala Leu Tyr Gly Leu Lys 1205 1210 1215 Gln Ala Pro Arg AlaTrp Tyr Glu Arg Leu Thr Glu Phe Leu Thr 1220 1225 1230 Gln Gln Gly TyrArg Lys Gly Gly Ile Asp Lys Thr Leu Phe Val 1235 1240 1245 Lys Gln AspAla Glu Asn Leu Met Ile Ala Gln Ile Tyr Val Asp 1250 1255 1260 Asp IleVal Phe Gly Gly Met Ser Asn Glu Met Leu Arg His Phe 1265 1270 1275 ValGln Gln Met Gln Ser Glu Phe Glu Met Ser Leu Val Gly Glu 1280 1285 1290Leu Thr Tyr Phe Leu Gly Leu Gln Val Lys Gln Met Glu Asp Ser 1295 13001305 Ile Phe Leu Ser Gln Ser Arg Tyr Ala Lys Asn Ile Val Lys Lys 13101315 1320 Phe Gly Met Glu Asn Ala Ser His Lys Arg Thr Pro Ala Pro Thr1325 1330 1335 His Leu Lys Leu Ser Lys Asp Glu Ala Gly Thr Ser Val AspGln 1340 1345 1350 Lys Pro Tyr Arg Ser Met Ile Gly Ser Leu Leu Tyr LeuThr Ala 1355 1360 1365 Ser Arg Pro Asp Ile Thr Tyr Ala Val Gly Val CysAla Arg Tyr 1370 1375 1380 Gln Ala Asn Pro Lys Ile Ser His Leu Asn GlnVal Lys Arg Ile 1385 1390 1395 Leu Lys Tyr Val Asn Gly Thr Ser Asp TyrGly Ile Met Tyr Cys 1400 1405 1410 His Cys Ser Ser Ser Met Leu Val GlyTyr Cys Asp Ala Asp Trp 1415 1420 1425 Ala Gly Ser Ala Asp Asp Arg LysSer Thr Ser Gly Gly Cys Phe 1430 1435 1440 Tyr Leu Gly Asn Asn Leu IleSer Trp Phe Ser Lys Lys Gln Asn 1445 1450 1455 Cys Val Ser Leu Ser ThrAla Glu Ala Glu Tyr Ile Ala Ala Gly 1460 1465 1470 Ser Ser Cys Ser GlnLeu Val Trp Met Lys Gln Met Leu Lys Glu 1475 1480 1485 Tyr Asn Val GluGln Asp Val Met Thr Leu Tyr Cys Asp Asn Met 1490 1495 1500 Ser Ala IleAsn Ile Ser Lys Asn Pro Val Gln His Ser Arg Thr 1505 1510 1515 Lys HisIle Asp Ile Arg His His Tyr Ile Arg Asp Leu Val Asp 1520 1525 1530 AspLys Val Ile Thr Leu Lys His Val Asp Thr Glu Glu Gln Ile 1535 1540 1545Ala Asp Ile Phe Thr Lys Ala Leu Asp Ala Asn Gln Phe Glu Lys 1550 15551560 Leu Arg Gly Lys Leu Gly Ile Cys Val Leu Glu Glu Leu 1565 1570 157589 682 PRT Glycine max misc_feature SIRE1 7 89 Gln Leu Gln Gln Ser GluArg Ala Gln Thr Asn His Leu Thr Leu Ile 1 5 10 15 Ala Arg Ser Leu LeuAsn Gln Gly Lys Phe Asp Arg Cys Phe Thr Arg 20 25 30 Pro Ser Thr Phe LeuIle Gln Ile Tyr Ile Cys Leu Ala Phe Val Phe 35 40 45 Leu Pro Ala Phe ProAsn Ser Leu Leu Arg Phe Thr Lys Ser Phe Gln 50 55 60 Thr Leu Cys Phe SerMet Ala Thr Ser Ser Lys Glu Thr Ala Ala Ser 65 70 75 80 Gly Ser Pro SerVal Pro Ser Ser Ser His Gln Glu Gln Pro Glu Leu 85 90 95 Asn Ile Gln ProIle Gln Ile Ile Pro Gly Gln Ala Ser Val Pro Glu 100 105 110 Lys Leu ValPro Arg Arg Pro Gln Gly Val Lys Ile Ala Glu Asn Pro 115 120 125 Ser ProAla Thr Ser Pro Arg Glu Val Asp Thr Glu Met Asp Lys Lys 130 135 140 IleArg Ser Ile Val Ser Ser Ile Leu Lys Asp Ala Ser Val Pro Glu 145 150 155160 Ala Asp Glu Asp Val Pro Thr Ser Ser Asn Pro Asn Val Ser Val Pro 165170 175 Asp Val Lys Lys Asp Val Pro Thr Ser Ser Ala Pro Asn Ala Glu Ala180 185 190 Leu Pro Ser Pro Gly Glu Glu Gly Ser Thr Glu Glu Asp Asp GlnAla 195 200 205 Ala Glu Glu Thr Pro Ala Pro Arg Ala Pro Glu Pro Ala ProGly Asp 210 215 220 Leu Ile Asp Leu Glu Glu Val Glu Ser Asp Glu Glu ProIle Ala Asn 225 230 235 240 Arg Leu Ala Pro Gly Ile Ala Glu Arg Leu GlnSer Arg Lys Gly Lys 245 250 255 Thr Pro Ile Lys Arg Ser Gly Arg Ile LysThr Met Ala Gln Lys Lys 260 265 270 Ser Thr Pro Ile Thr Pro Ala Thr SerArg Arg Ser Lys Val Ala Ile 275 280 285 Pro Ser Lys Lys Arg Lys Glu IleSer Ser Ser Asp Ser Asp Lys Asp 290 295 300 Val Glu Leu Asp Val Ser ThrSer Lys Lys Ala Lys Thr Ser Gly Lys 305 310 315 320 Lys Val Pro Gly AsnVal Pro Asp Ala Pro Leu Asp Asn Ile Ser Phe 325 330 335 His Ser Ile GlyAsn Val Glu Lys Trp Lys Tyr Val Tyr Gln Arg Arg 340 345 350 Leu Ala ValGlu Arg Glu Leu Gly Arg Asp Ala Leu Asp Cys Lys Glu 355 360 365 Ile MetAsp Leu Ile Lys Ala Gly Gly Leu Leu Lys Thr Val Ser Lys 370 375 380 LeuGly Asp Cys Tyr Glu Gly Leu Val Arg Glu Phe Ile Val Asn Ile 385 390 395400 Pro Ser Asp Ile Ser Asn Arg Lys Ser Asp Glu Tyr Gln Lys Val Phe 405410 415 Val Arg Gly Lys Cys Val Lys Phe Ser Pro Ala Val Ile Asn Lys Tyr420 425 430 Leu Gly Arg Pro Thr Asp Gly Val Ile Asp Ile Asp Val Ser GluHis 435 440 445 Gln Ile Ala Lys Glu Ile Thr Ala Lys Arg Val Gln His TrpPro Lys 450 455 460 Lys Gly Lys Leu Ser Ala Gly Lys Leu Ser Val Lys TyrAla Ile Leu 465 470 475 480 His Arg Ile Gly Ala Ala Asn Trp Val Pro ThrAsn His Thr Ser Thr 485 490 495 Val Ala Thr Gly Leu Gly Lys Phe Leu TyrAla Val Gly Thr Lys Ser 500 505 510 Lys Phe Asn Phe Gly Asn Tyr Ile PheAsp Gln Thr Val Lys His Ser 515 520 525 Glu Ser Phe Ala Ile Lys Leu ProIle Ala Phe Pro Thr Val Leu Cys 530 535 540 Gly Ile Met Leu Ser Gln HisPro Asn Met Leu Asn Tyr Thr Asp Ser 545 550 555 560 Val Met Lys Arg GluSer Pro Leu Ser Leu His Tyr Lys Leu Phe Glu 565 570 575 Gly Thr His ValPro Asp Ile Val Ser Thr Ser Val Ser Thr Ser Gly 580 585 590 Lys Ala AlaAla Ser Gly Ala Val Ser Lys Asp Ala Leu Ile Ala Glu 595 600 605 Leu LysAsp Thr Cys Lys Val Leu Glu Ala Thr Ile Lys Ala Thr Thr 610 615 620 GluLys Lys Met Glu Leu Glu Leu Leu Ile Lys Arg Leu Ser Glu Ser 625 630 635640 Gly Ile Asp Asp Glu Glu Ala Ala Glu Glu Glu Gly Glu Ala Ala Glu 645650 655 Glu Glu Glu Glu Ala Ala Glu Glu Glu Glu Asp Ala Ala Glu Glu Thr660 665 670 Glu Ser Asp Asp Asp Ser Glu Ala Thr Pro 675 680 90 9358 DNAGlycine max misc_feature Soybean retroelement SIRE1 8 90 gcttagcgcatgatttttgt aggaacaccc atggggcaat ttggtttgca cattgttagt 60 gcttagctttactgagtttt aaaagattgg ctaaaatttt gttaaaacat aagcacttag 120 acaatgaaggaaagctggag ttgctgcaca tgatgtctaa cattatgtca aggaatcaga 180 tcgggttgcacaatgcacaa ggcaagataa aatgtcaaat gaagaattga agctgcagga 240 tccacgatgtcggatacaat gtccaggaca tcctgcccga aaatactgga cacataaatc 300 tgttatatctttaacagatt aatgtgcagt cagcaacaga ttaggcgatc tatctttagg 360 aacgaattaaaagaaaatta aagttcgaat tacaaacttg aatagttcgt tcagggatta 420 aagattaaagataaaaacta aaagatcaaa ctttatcttt gagatcttta agtgcagatt 480 ttcaggagaatgatagatct tatccagcgc aagttgttgc agcccagata cgcacactgc 540 tatataaacatgaaggctgc acgagttttc taccaagtcc gagattgaag agttattttg 600 tgagttttgggacttgagtg ttttgtgagc caccttgatg ttaccctaac atcaagtgtt 660 ggacctgagtgtgtagagtt gatctctatt gttcagagag caatctctgg tgtgtctttg 720 atttatttgtaaacacggga gagtgattga gagggagtga gaggggttct catatctaag 780 agtggctcttaggtagaggt tgcatgggta gtggttaggt gagaaggttg taaacagtgg 840 ctgttagatcttcgaactaa cactatttta gtggatttcc tccctggctt ggtagccccc 900 agatgtaggtgacgttgcac cgaactgggt taacaattct cttgtgttat ttacttgttt 960 aatctgttcatactgtcaaa tataatctgc atgttctgaa gcgtgatgtc gtgacatccg 1020 gtacgacatctgtcattggt atcagaattt caattggtat cagagcgggc actctaaatc 1080 actgagtgagatctagggag ataaattctg atg aac atg gag aaa gaa gga gga 1134 Met Asn MetGlu Lys Glu Gly Gly 1 5 cca gtg aac aga cca cca att ctg gat gga acc aactat gaa tac tgg 1182 Pro Val Asn Arg Pro Pro Ile Leu Asp Gly Thr Asn TyrGlu Tyr Trp 10 15 20 aaa gca agg atg gtg gcc ttc ctc aaa tca ctg gat agcaga acc tgg 1230 Lys Ala Arg Met Val Ala Phe Leu Lys Ser Leu Asp Ser ArgThr Trp 25 30 35 40 aaa gct gtc atc aaa ggc tgg gaa cat ccc aag atg ttggac aca gaa 1278 Lys Ala Val Ile Lys Gly Trp Glu His Pro Lys Met Leu AspThr Glu 45 50 55 gga aag ccc act aat gaa ttg aag cca gaa gaa gac tgg acaaaa gaa 1326 Gly Lys Pro Thr Asn Glu Leu Lys Pro Glu Glu Asp Trp Thr LysGlu 60 65 70 gaa gac gaa ttg gca ctt gga aac tcc aaa gcc ttg aat gcc ctattc 1374 Glu Asp Glu Leu Ala Leu Gly Asn Ser Lys Ala Leu Asn Ala Leu Phe75 80 85 aat gga gtt gac aag aat atc ttc aga ctg atc aac aca tgc aca gtg1422 Asn Gly Val Asp Lys Asn Ile Phe Arg Leu Ile Asn Thr Cys Thr Val 9095 100 gcc aag gat gca tgt gga gag atc ctg aaa acc act cat gaa gga acc1470 Ala Lys Asp Ala Cys Gly Glu Ile Leu Lys Thr Thr His Glu Gly Thr 105110 115 120 tcc aaa gtg aag atg tcc aga ttg caa cta ttg gct aca aaa ttcgaa 1518 Ser Lys Val Lys Met Ser Arg Leu Gln Leu Leu Ala Thr Lys Phe Glu125 130 135 aat ctg aag atg aag gag gaa gag tgt att cat gac ttc cac atgaac 1566 Asn Leu Lys Met Lys Glu Glu Glu Cys Ile His Asp Phe His Met Asn140 145 150 att ctt gaa att gcc aat gct tgc act gcc ttg gga gaa agg atgaca 1614 Ile Leu Glu Ile Ala Asn Ala Cys Thr Ala Leu Gly Glu Arg Met Thr155 160 165 gat gaa aag ctg gtg aga aag atc ctc aga tct ttg cct aag agattt 1662 Asp Glu Lys Leu Val Arg Lys Ile Leu Arg Ser Leu Pro Lys Arg Phe170 175 180 gac atg aaa gtc act gca ata gag gag gcc caa gac att tgc aacatg 1710 Asp Met Lys Val Thr Ala Ile Glu Glu Ala Gln Asp Ile Cys Asn Met185 190 195 200 aga gta gat gaa ctc att ggt tcc ctt caa acc ttt gag ctagga ctc 1758 Arg Val Asp Glu Leu Ile Gly Ser Leu Gln Thr Phe Glu Leu GlyLeu 205 210 215 tcg gat agg aat gaa aag aag agc aag aac ctg gcg ttc gtgtcc aat 1806 Ser Asp Arg Asn Glu Lys Lys Ser Lys Asn Leu Ala Phe Val SerAsn 220 225 230 gat gaa gga gaa gaa gat gag tat gac ctg gat act gat gaaggg ctg 1854 Asp Glu Gly Glu Glu Asp Glu Tyr Asp Leu Asp Thr Asp Glu GlyLeu 235 240 245 act aac gca gtt ggg ctc ctt gga aaa cag ttc aac aaa gtgctg aac 1902 Thr Asn Ala Val Gly Leu Leu Gly Lys Gln Phe Asn Lys Val LeuAsn 250 255 260 aga atg gac agg agg cag aaa cca cat gtc cgg aac atc cctttc gac 1950 Arg Met Asp Arg Arg Gln Lys Pro His Val Arg Asn Ile Pro PheAsp 265 270 275 280 atc agg aaa ggt agt gaa tac cac aaa aag tca gat gaaaag ccc agt 1998 Ile Arg Lys Gly Ser Glu Tyr His Lys Lys Ser Asp Glu LysPro Ser 285 290 295 cac agc aaa gga att caa tgc cat ggg tgt gaa ggc tatggg cac atc 2046 His Ser Lys Gly Ile Gln Cys His Gly Cys Glu Gly Tyr GlyHis Ile 300 305 310 aaa gct gaa tgt ccc acc cat ctc aag aag cag agg aaagga ctt tct 2094 Lys Ala Glu Cys Pro Thr His Leu Lys Lys Gln Arg Lys GlyLeu Ser 315 320 325 gta tgt cgg tct gat gat aca gag agt gaa caa gaa agtgat tct gac 2142 Val Cys Arg Ser Asp Asp Thr Glu Ser Glu Gln Glu Ser AspSer Asp 330 335 340 aga gat gtg aat gca ctc act ggg aga ttt gaa tct gatgaa gat tca 2190 Arg Asp Val Asn Ala Leu Thr Gly Arg Phe Glu Ser Asp GluAsp Ser 345 350 355 360 agt gat att gaa atc act ttt gat gag ctt gct atatcc tat aga aaa 2238 Ser Asp Ile Glu Ile Thr Phe Asp Glu Leu Ala Ile SerTyr Arg Lys 365 370 375 cta tgc atc aaa agt gag aag att ctt cag caa gaagca caa ctg aag 2286 Leu Cys Ile Lys Ser Glu Lys Ile Leu Gln Gln Glu AlaGln Leu Lys 380 385 390 aag gtc att gca aat ctg gag gct gag aag gag gcacat gaa gag gag 2334 Lys Val Ile Ala Asn Leu Glu Ala Glu Lys Glu Ala HisGlu Glu Glu 395 400 405 atc tct gag ctt aaa gga gaa gtt ggt ttt ctg aactct aaa ctg gaa 2382 Ile Ser Glu Leu Lys Gly Glu Val Gly Phe Leu Asn SerLys Leu Glu 410 415 420 aac atg aca aaa tca ata aag atg ctg aat aaa ggctca gat atg ctt 2430 Asn Met Thr Lys Ser Ile Lys Met Leu Asn Lys Gly SerAsp Met Leu 425 430 435 440 gat gag gtg cta cag ctt ggg aag aat gtt ggaaac cag aga gga ctt 2478 Asp Glu Val Leu Gln Leu Gly Lys Asn Val Gly AsnGln Arg Gly Leu 445 450 455 ggg ttt aat cat aaa tct gct tgc aga ata accatg aca gaa ttt gtt 2526 Gly Phe Asn His Lys Ser Ala Cys Arg Ile Thr MetThr Glu Phe Val 460 465 470 cct gcc aaa aac agc act gga gcc acg atg tcacaa cat cgg tct cga 2574 Pro Ala Lys Asn Ser Thr Gly Ala Thr Met Ser GlnHis Arg Ser Arg 475 480 485 cat cat gga acg cag cag aaa aag agc aaa agaaag aag tgg agg tgt 2622 His His Gly Thr Gln Gln Lys Lys Ser Lys Arg LysLys Trp Arg Cys 490 495 500 cac tac tgt ggc aag tat ggt cac ata aag cccttt tgc tat cat cta 2670 His Tyr Cys Gly Lys Tyr Gly His Ile Lys Pro PheCys Tyr His Leu 505 510 515 520 cat ggc cat cca cat cat gga act caa agtagc agc agc gga agg aag 2718 His Gly His Pro His His Gly Thr Gln Ser SerSer Ser Gly Arg Lys 525 530 535 atg atg tgg gtt cca aaa cac aag att gttagt ctt gtt gtt cat act 2766 Met Met Trp Val Pro Lys His Lys Ile Val SerLeu Val Val His Thr 540 545 550 tca ctt aga gca tca gct aag gaa gat tggtac cta gat agc ggc tgt 2814 Ser Leu Arg Ala Ser Ala Lys Glu Asp Trp TyrLeu Asp Ser Gly Cys 555 560 565 tcc aga cac atg aca gga gtt aaa gaa ttcctg gtg aac att gaa cct 2862 Ser Arg His Met Thr Gly Val Lys Glu Phe LeuVal Asn Ile Glu Pro 570 575 580 tgc tcc act agc tat gtg aca ttt gga gatggc tct aaa gga aag atc 2910 Cys Ser Thr Ser Tyr Val Thr Phe Gly Asp GlySer Lys Gly Lys Ile 585 590 595 600 act gga atg gga aag cta gtc cat gatgga ctt cct agt ctg aac aaa 2958 Thr Gly Met Gly Lys Leu Val His Asp GlyLeu Pro Ser Leu Asn Lys 605 610 615 gta ctg ctg gtg aag gga ctg act gcgaac ttg atc agc atc agt cag 3006 Val Leu Leu Val Lys Gly Leu Thr Ala AsnLeu Ile Ser Ile Ser Gln 620 625 630 ttg tgt gat gaa gga ttc aat gta aacttc aca aag tca gaa tgc ttg 3054 Leu Cys Asp Glu Gly Phe Asn Val Asn PheThr Lys Ser Glu Cys Leu 635 640 645 gtg aca aat gag aag agt gaa gtt ctaatg aag ggc agc aga tca aag 3102 Val Thr Asn Glu Lys Ser Glu Val Leu MetLys Gly Ser Arg Ser Lys 650 655 660 gac aac tgt tac cta tgg aca cct caagaa acc agt tac tcc tcc aca 3150 Asp Asn Cys Tyr Leu Trp Thr Pro Gln GluThr Ser Tyr Ser Ser Thr 665 670 675 680 tgt cta tcc tcc aaa gaa gat gaagtc aaa ata tgg cat caa aga ttt 3198 Cys Leu Ser Ser Lys Glu Asp Glu ValLys Ile Trp His Gln Arg Phe 685 690 695 gga cat ctg cac tta aga ggc atgaag aaa atc att gac aaa ggt gct 3246 Gly His Leu His Leu Arg Gly Met LysLys Ile Ile Asp Lys Gly Ala 700 705 710 gtt aga ggc att ccc aat ctg aaaata gaa gaa ggc aga atc tgt ggt 3294 Val Arg Gly Ile Pro Asn Leu Lys IleGlu Glu Gly Arg Ile Cys Gly 715 720 725 gaa tgt cag att gga aag caa gtcaag atg tcc cac cag aag ctt caa 3342 Glu Cys Gln Ile Gly Lys Gln Val LysMet Ser His Gln Lys Leu Gln 730 735 740 cat cag acc act tcc atg gtg ctggaa cta ctt cac atg gac ttg atg 3390 His Gln Thr Thr Ser Met Val Leu GluLeu Leu His Met Asp Leu Met 745 750 755 760 ggg cct atg caa gtt gaa agcctt gga gga aag agg tat gcc tat gtt 3438 Gly Pro Met Gln Val Glu Ser LeuGly Gly Lys Arg Tyr Ala Tyr Val 765 770 775 gtt gtg gat gat ttc tcc agattt acc tgg gtc aac ttt atc aga gag 3486 Val Val Asp Asp Phe Ser Arg PheThr Trp Val Asn Phe Ile Arg Glu 780 785 790 aaa tca gac acc ttt gaa gtattc aaa gag ttg agt cta aga ctt caa 3534 Lys Ser Asp Thr Phe Glu Val PheLys Glu Leu Ser Leu Arg Leu Gln 795 800 805 aga gaa aaa gac tgt gtc atcaag aga att agg agt gac cat ggc aga 3582 Arg Glu Lys Asp Cys Val Ile LysArg Ile Arg Ser Asp His Gly Arg 810 815 820 gag ttt gaa aac agc aag tttact gaa ttc tgc aca tct gaa ggc atc 3630 Glu Phe Glu Asn Ser Lys Phe ThrGlu Phe Cys Thr Ser Glu Gly Ile 825 830 835 840 act cat gag ttc tct gcagcc att aca cca caa caa aat ggc ata gtt 3678 Thr His Glu Phe Ser Ala AlaIle Thr Pro Gln Gln Asn Gly Ile Val 845 850 855 gaa agg aaa aac agg actttg caa gaa gct act agg gtc atg ctt cat 3726 Glu Arg Lys Asn Arg Thr LeuGln Glu Ala Thr Arg Val Met Leu His 860 865 870 gcc aaa gaa ctt ccc tataat ctc tgg gct gaa gcc atg aac aca gca 3774 Ala Lys Glu Leu Pro Tyr AsnLeu Trp Ala Glu Ala Met Asn Thr Ala 875 880 885 tgc tat atc cac aac agagtc aca ctt aga aga ggg act cca acc aca 3822 Cys Tyr Ile His Asn Arg ValThr Leu Arg Arg Gly Thr Pro Thr Thr 890 895 900 ctg tat gaa atc tgg aaaggg agg aag cca act gtc aag cac ttc cac 3870 Leu Tyr Glu Ile Trp Lys GlyArg Lys Pro Thr Val Lys His Phe His 905 910 915 920 atc ttt gga agt ccatgt tac att ttg gca gat aga gag caa agg aga 3918 Ile Phe Gly Ser Pro CysTyr Ile Leu Ala Asp Arg Glu Gln Arg Arg 925 930 935 aag atg gat ccc aagagt gat gca gga ata ttc ttg gga tac tct aca 3966 Lys Met Asp Pro Lys SerAsp Ala Gly Ile Phe Leu Gly Tyr Ser Thr 940 945 950 aac agc aga gca tataga gta ttc aat tcc aga acc aga act gtg atg 4014 Asn Ser Arg Ala Tyr ArgVal Phe Asn Ser Arg Thr Arg Thr Val Met 955 960 965 gaa tcc atc aat gtggtt gtt gat gat cta act cca gca aga aag aag 4062 Glu Ser Ile Asn Val ValVal Asp Asp Leu Thr Pro Ala Arg Lys Lys 970 975 980 gat gtc gaa gaa gatgtc aga aca tcg gaa gac aat gta gca gat aca 4110 Asp Val Glu Glu Asp ValArg Thr Ser Glu Asp Asn Val Ala Asp Thr 985 990 995 1000 gct aaa agt gcagaa aat gca gaa aaa tct gat tct act aca gat 4155 Ala Lys Ser Ala Glu AsnAla Glu Lys Ser Asp Ser Thr Thr Asp 1005 1010 1015 gaa cca aac atc aatcaa cct gac aag agt ccc ttc att aga atc 4200 Glu Pro Asn Ile Asn Gln ProAsp Lys Ser Pro Phe Ile Arg Ile 1020 1025 1030 cag aag atg caa ccc aaggag ctg att ata gga gat cca aac aga 4245 Gln Lys Met Gln Pro Lys Glu LeuIle Ile Gly Asp Pro Asn Arg 1035 1040 1045 gga gtc act aca aga tca agggag att gag att gtc tcc aat tca 4290 Gly Val Thr Thr Arg Ser Arg Glu IleGlu Ile Val Ser Asn Ser 1050 1055 1060 tgt ttt gtc tcc aaa att gag cccaag aat gtg aaa gag gca ctg 4335 Cys Phe Val Ser Lys Ile Glu Pro Lys AsnVal Lys Glu Ala Leu 1065 1070 1075 act gat gag ttc tgg atc aat gct atgcaa gaa gaa ttg gag caa 4380 Thr Asp Glu Phe Trp Ile Asn Ala Met Gln GluGlu Leu Glu Gln 1080 1085 1090 ttc aaa agg aat gaa gtt tgg gag cta gttcct aga ccc gag gga 4425 Phe Lys Arg Asn Glu Val Trp Glu Leu Val Pro ArgPro Glu Gly 1095 1100 1105 act aat gtg att ggc acc aag tgg atc ttc aagaac aaa acc aat 4470 Thr Asn Val Ile Gly Thr Lys Trp Ile Phe Lys Asn LysThr Asn 1110 1115 1120 gaa gaa ggt gtt ata acc aga aac aag gcc aga cttgtt gct caa 4515 Glu Glu Gly Val Ile Thr Arg Asn Lys Ala Arg Leu Val AlaGln 1125 1130 1135 ggc tac act cag att gaa ggt gta gac ttt gat gaa actttc gcc 4560 Gly Tyr Thr Gln Ile Glu Gly Val Asp Phe Asp Glu Thr Phe Ala1140 1145 1150 cct gtt gct aga ctt gag tcc atc aga ttg tta ctt ggt gtagct 4605 Pro Val Ala Arg Leu Glu Ser Ile Arg Leu Leu Leu Gly Val Ala1155 1160 1165 tgc atc ctc aaa ttc aag ttg tac cag atg gat gtg aag agcgcg 4650 Cys Ile Leu Lys Phe Lys Leu Tyr Gln Met Asp Val Lys Ser Ala1170 1175 1180 ttt ctg aat gga tac ctg aat gaa gaa gcc tat gtg gag cagcca 4695 Phe Leu Asn Gly Tyr Leu Asn Glu Glu Ala Tyr Val Glu Gln Pro1185 1190 1195 aag gga ttt gta gat cca act cat cta gat cat gta tac aggctc 4740 Lys Gly Phe Val Asp Pro Thr His Leu Asp His Val Tyr Arg Leu1200 1205 1210 aag aag gct ctc tat gga ttg aag caa gct cca aga gct tggtat 4785 Lys Lys Ala Leu Tyr Gly Leu Lys Gln Ala Pro Arg Ala Trp Tyr1215 1220 1225 gaa agg cta aca gag ttc ctt act cag caa ggg tat agg aaggga 4830 Glu Arg Leu Thr Glu Phe Leu Thr Gln Gln Gly Tyr Arg Lys Gly1230 1235 1240 gga att gac aag act ctc ttt gtc aaa caa gat gct gaa aacttg 4875 Gly Ile Asp Lys Thr Leu Phe Val Lys Gln Asp Ala Glu Asn Leu1245 1250 1255 atg ata gca cag ata tat gtt gat gac att gtg ttt gga gggatg 4920 Met Ile Ala Gln Ile Tyr Val Asp Asp Ile Val Phe Gly Gly Met1260 1265 1270 tcg aat gag atg ctt cga cat ttt gtc cca cag atg caa tctgaa 4965 Ser Asn Glu Met Leu Arg His Phe Val Pro Gln Met Gln Ser Glu1275 1280 1285 ttt gag atg agt ctt gtt gga gag ctg cat tat ttt ctg ggactc 5010 Phe Glu Met Ser Leu Val Gly Glu Leu His Tyr Phe Leu Gly Leu1290 1295 1300 caa gtg aag cag atg gaa gac tcc ata ttc ctc tca caa agcaag 5055 Gln Val Lys Gln Met Glu Asp Ser Ile Phe Leu Ser Gln Ser Lys1305 1310 1315 tat gca aag aac att gtc aag aag ttt ggg atg gaa aat gccagc 5100 Tyr Ala Lys Asn Ile Val Lys Lys Phe Gly Met Glu Asn Ala Ser1320 1325 1330 cat aaa aga aca cct gca cct act cac ttg aag ctg tca aaagat 5145 His Lys Arg Thr Pro Ala Pro Thr His Leu Lys Leu Ser Lys Asp1335 1340 1345 gaa gct ggc acc agt gtt gat caa aat ctg tac aga agc atgatt 5190 Glu Ala Gly Thr Ser Val Asp Gln Asn Leu Tyr Arg Ser Met Ile1350 1355 1360 ggg agc tta cta tat tta aca gca agc aga cct gac atc accttt 5235 Gly Ser Leu Leu Tyr Leu Thr Ala Ser Arg Pro Asp Ile Thr Phe1365 1370 1375 gca gta ggt gtt tgt gca aga tat caa gcc aat cct aag ataagt 5280 Ala Val Gly Val Cys Ala Arg Tyr Gln Ala Asn Pro Lys Ile Ser1380 1385 1390 cac ttg aat caa gta aag aga att ctg aaa tat gta aat ggcacc 5325 His Leu Asn Gln Val Lys Arg Ile Leu Lys Tyr Val Asn Gly Thr1395 1400 1405 agt gac tat ggg att atg tac tgt cat tgt tca gat tca atgctg 5370 Ser Asp Tyr Gly Ile Met Tyr Cys His Cys Ser Asp Ser Met Leu1410 1415 1420 gtt ggg tat tgt gat gct gat tgg gct gga agt gca gat gacaga 5415 Val Gly Tyr Cys Asp Ala Asp Trp Ala Gly Ser Ala Asp Asp Arg1425 1430 1435 aaa tgc act tct ggt gga tgt ttc tat ttg gga acc aat cttatt 5460 Lys Cys Thr Ser Gly Gly Cys Phe Tyr Leu Gly Thr Asn Leu Ile1440 1445 1450 tca tgg ttc agc aag aag cag aac tgt gtg tcc cta tct actgct 5505 Ser Trp Phe Ser Lys Lys Gln Asn Cys Val Ser Leu Ser Thr Ala1455 1460 1465 gaa gca gag tat att gca gca gga agc agt tgt tca caa ctagtt 5550 Glu Ala Glu Tyr Ile Ala Ala Gly Ser Ser Cys Ser Gln Leu Val1470 1475 1480 tgg atg aag cag atg ctg aag gag tac aat gtc gaa caa gatgtc 5595 Trp Met Lys Gln Met Leu Lys Glu Tyr Asn Val Glu Gln Asp Val1485 1490 1495 atg aca ttg tac tgt gac aac atg agt gct att aat att tctaaa 5640 Met Thr Leu Tyr Cys Asp Asn Met Ser Ala Ile Asn Ile Ser Lys1500 1505 1510 aat cct gtt caa cac aac aga acc aag cac att gac att agacat 5685 Asn Pro Val Gln His Asn Arg Thr Lys His Ile Asp Ile Arg His1515 1520 1525 cac tat att aga gat ctt gtt gat gat aaa att atc aca ctggag 5730 His Tyr Ile Arg Asp Leu Val Asp Asp Lys Ile Ile Thr Leu Glu1530 1535 1540 cat gtt gac act gag gaa caa gta gca gat att ttc aca aaggca 5775 His Val Asp Thr Glu Glu Gln Val Ala Asp Ile Phe Thr Lys Ala1545 1550 1555 ttg gat gca aat cag ttt gaa aaa ctg agg ggc aag ctg ggcact 5820 Leu Asp Ala Asn Gln Phe Glu Lys Leu Arg Gly Lys Leu Gly Thr1560 1565 1570 tgt ctg cta gag gat tta tag caa tta ctt cta tct gaa cgtgtt 5865 Cys Leu Leu Glu Asp Leu Gln Leu Leu Leu Ser Glu Arg Val 15751580 caa acg tta ata gca cgt tct cta ctg ggc caa aac aaa ttc gac 5910Gln Thr Leu Ile Ala Arg Ser Leu Leu Gly Gln Asn Lys Phe Asp 1585 15901595 cgt tgc ttc aca cgt ccc tct aca ttc ctc att caa act tac att 5955Arg Cys Phe Thr Arg Pro Ser Thr Phe Leu Ile Gln Thr Tyr Ile 1600 16051610 ttc gtg gta atc tcg ttt tca ttc acc aac acc tct cag ata ttc 6000Phe Val Val Ile Ser Phe Ser Phe Thr Asn Thr Ser Gln Ile Phe 1615 16201625 acg aaa cct ttt caa aag ctc tgc ttc tcc atg gct acc tca cca 6045Thr Lys Pro Phe Gln Lys Leu Cys Phe Ser Met Ala Thr Ser Pro 1630 16351640 aaa gaa act tca tct cct gtt tca ccc tct gta cca tca cct cca 6090Lys Glu Thr Ser Ser Pro Val Ser Pro Ser Val Pro Ser Pro Pro 1645 16501655 tca tcc acc aaa gca cca tca aac cag gaa caa cct gaa ttc aat 6135Ser Ser Thr Lys Ala Pro Ser Asn Gln Glu Gln Pro Glu Phe Asn 1660 16651670 atc caa ccc ata caa atg att cct ggt cca gcc cct gtt cct gag 6180Ile Gln Pro Ile Gln Met Ile Pro Gly Pro Ala Pro Val Pro Glu 1675 16801685 aaa ctg gtc ccc aaa aga caa cag gga gtg aag att tct gaa aac 6225Lys Leu Val Pro Lys Arg Gln Gln Gly Val Lys Ile Ser Glu Asn 1690 16951700 cct agc ctt gca aca agt cct agg gaa gta gac acg gag atg gat 6270Pro Ser Leu Ala Thr Ser Pro Arg Glu Val Asp Thr Glu Met Asp 1705 17101715 aag aag atc cgc agt att gtg agt agc att ttg aaa aat gct tct 6315Lys Lys Ile Arg Ser Ile Val Ser Ser Ile Leu Lys Asn Ala Ser 1720 17251730 gtc cct gat gct gat aaa gat gtt cca aca tct tcc acc cca aat 6360Val Pro Asp Ala Asp Lys Asp Val Pro Thr Ser Ser Thr Pro Asn 1735 17401745 gct gaa gtc ctc tct tca tcc agt aaa gag aaa tca aca gag gaa 6405Ala Glu Val Leu Ser Ser Ser Ser Lys Glu Lys Ser Thr Glu Glu 1750 17551760 gag gat caa gcc aca gag gag acc cct gca cca agg gca cca gaa 6450Glu Asp Gln Ala Thr Glu Glu Thr Pro Ala Pro Arg Ala Pro Glu 1765 17701775 cct gct cca ggt gac ctc att gat cta gaa gag gta gaa tct gat 6495Pro Ala Pro Gly Asp Leu Ile Asp Leu Glu Glu Val Glu Ser Asp 1780 17851790 gag gaa ccc att gtc aaa aag ttg gca ctt ggc att gca gaa aga 6540Glu Glu Pro Ile Val Lys Lys Leu Ala Leu Gly Ile Ala Glu Arg 1795 18001805 tta caa agc aga aag gga aaa acc ccc att act agg tct gga cga 6585Leu Gln Ser Arg Lys Gly Lys Thr Pro Ile Thr Arg Ser Gly Arg 1810 18151820 atc aaa act att gca cag aag aag agc aca cca atc act cct acc 6630Ile Lys Thr Ile Ala Gln Lys Lys Ser Thr Pro Ile Thr Pro Thr 1825 18301835 aca tcc aga tgg agc aaa gtt gca atc cct tcc aag aag agg aaa 6675Thr Ser Arg Trp Ser Lys Val Ala Ile Pro Ser Lys Lys Arg Lys 1840 18451850 gaa att tcc tca tct gat tct gat gat gat gtc gaa cta gat gtt 6720Glu Ile Ser Ser Ser Asp Ser Asp Asp Asp Val Glu Leu Asp Val 1855 18601865 ccc gac atc aag aga gcc aag aaa tca ggg aaa aag gtg cct gga 6765Pro Asp Ile Lys Arg Ala Lys Lys Ser Gly Lys Lys Val Pro Gly 1870 18751880 aat gtc cct gat gcc cca ttg gac aac att tca ttc cac tcc att 6810Asn Val Pro Asp Ala Pro Leu Asp Asn Ile Ser Phe His Ser Ile 1885 18901895 ggc aat gtt gaa agg tgg aaa ttt gta tat caa cgc aga ctt gct 6855Gly Asn Val Glu Arg Trp Lys Phe Val Tyr Gln Arg Arg Leu Ala 1900 19051910 tta gaa aga gaa ctg gga aga gat gcc ttg gat tgc aag gag atc 6900Leu Glu Arg Glu Leu Gly Arg Asp Ala Leu Asp Cys Lys Glu Ile 1915 19201925 atg gac ctc atc aag gct gct gga ctg ctg aaa aca gtc acc aag 6945Met Asp Leu Ile Lys Ala Ala Gly Leu Leu Lys Thr Val Thr Lys 1930 19351940 ttg gga gat tgt tat gaa agt cta gtc agg gaa ttc att gtc aac 6990Leu Gly Asp Cys Tyr Glu Ser Leu Val Arg Glu Phe Ile Val Asn 1945 19501955 att ccc tct gac ata aca aac aga aag agt gat gag tat cag aca 7035Ile Pro Ser Asp Ile Thr Asn Arg Lys Ser Asp Glu Tyr Gln Thr 1960 19651970 gtg ttt gtc aga gga aaa ggt att aga ttc tcc cct gct gta atc 7080Val Phe Val Arg Gly Lys Gly Ile Arg Phe Ser Pro Ala Val Ile 1975 19801985 aac aaa tac ctg ggc aga cca act gaa gga gtg gtg gat att gct 7125Asn Lys Tyr Leu Gly Arg Pro Thr Glu Gly Val Val Asp Ile Ala 1990 19952000 gtt tct gag cat caa att gcc aag gaa atc act gcc aaa caa gtc 7170Val Ser Glu His Gln Ile Ala Lys Glu Ile Thr Ala Lys Gln Val 2005 20102015 cag cat tgg cca aag aaa ggg aag ctt tct gca ggg aag cta agt 7215Gln His Trp Pro Lys Lys Gly Lys Leu Ser Ala Gly Lys Leu Ser 2020 20252030 gtg aag tat gca atc ctg cat agg att ggc act gca aac tgg gta 7260Val Lys Tyr Ala Ile Leu His Arg Ile Gly Thr Ala Asn Trp Val 2035 20402045 ccc acc aat cat act tcc act gtt gcc aca ggt ttg ggt aaa ttt 7305Pro Thr Asn His Thr Ser Thr Val Ala Thr Gly Leu Gly Lys Phe 2050 20552060 ctg tat gct gtt gga acc aag tcc aaa ttt aat ttt gga aac tat 7350Leu Tyr Ala Val Gly Thr Lys Ser Lys Phe Asn Phe Gly Asn Tyr 2065 20702075 att ttt gat caa act gtt aag cat tca gaa tct ttt gct gtc aaa 7395Ile Phe Asp Gln Thr Val Lys His Ser Glu Ser Phe Ala Val Lys 2080 20852090 tta ccc att gcc ttc cca act gta ttg tgt ggc att atg ttg agt 7440Leu Pro Ile Ala Phe Pro Thr Val Leu Cys Gly Ile Met Leu Ser 2095 21002105 caa cat ccc aat att tta aac aac att gac tct gtg aag aag aga 7485Gln His Pro Asn Ile Leu Asn Asn Ile Asp Ser Val Lys Lys Arg 2110 21152120 gaa tct gct cta tcc ctg cat tac aaa ctg ttt gag ggg aca cat 7530Glu Ser Ala Leu Ser Leu His Tyr Lys Leu Phe Glu Gly Thr His 2125 21302135 gtc cca gac att gtc tcg aca tca ggg aaa gct gct gct tca ggt 7575Val Pro Asp Ile Val Ser Thr Ser Gly Lys Ala Ala Ala Ser Gly 2140 21452150 gct gtg acc aag gat gct ttg att gct gaa ctc aag gac aca tgc 7620Ala Val Thr Lys Asp Ala Leu Ile Ala Glu Leu Lys Asp Thr Cys 2155 21602165 aag gtg ctg gag gca acc atc aaa gcc acc aca gag aag aaa atg 7665Lys Val Leu Glu Ala Thr Ile Lys Ala Thr Thr Glu Lys Lys Met 2170 21752180 gag ctg gaa cgc ctg atc aaa aga ctc tca gac agt ggc att gat 7710Glu Leu Glu Arg Leu Ile Lys Arg Leu Ser Asp Ser Gly Ile Asp 2185 21902195 gat gga gaa gca gct gag gaa gaa gaa gaa gca gct gag gag gaa 7755Asp Gly Glu Ala Ala Glu Glu Glu Glu Glu Ala Ala Glu Glu Glu 2200 22052210 gaa gat gca gca gag gat aca gaa tca gat gat gat gat tct gat 7800Glu Asp Ala Ala Glu Asp Thr Glu Ser Asp Asp Asp Asp Ser Asp 2215 22202225 gcc acc cca tgaccatcag acctttattt ttgcttttac ttttactagt 7849 AlaThr Pro 2230 tattggtctg taatatttgc acattaattt catgcattct acttttgccaaattctgtct 7909 aaaaaggggg agtagtagga tattatatta tgcatgattt atgattttgagggggagtag 7969 tagttatatg attttgaggg ggagtagtat ttatactact gctgctgatgatgattgatg 8029 taagctacta aaactagtag ctgatagaag atcgccgcag tgaactgcttcacagcagta 8089 ggagcatgga gacaggggga gcagaaagct gatgtcacgt gagatgtcttgacatcctgg 8149 aaacgacttg caacttgcag aattttgctg tcgccactac agataccgctgtgcttgatt 8209 actctgatag tgaaagttgc tgatcccact tgcataactg ctcgtacctgctcaggaagt 8269 gtctaagtat gttttagaca aaatttgcca aagggggaga ttgttagtgcttagctttac 8329 tgagttttaa aagattggct aaaattttgt taaaacataa gcacttagacaatgaaggaa 8389 agctggagtt gctgcacatg atgtctaaca ttatgtcaag gaatcagatcgggttgcaca 8449 atgcacaagg caagataaaa tgtcaaatga agaattgaag ctgcaggatccacgatgtcg 8509 gatacaatgt gcaggacatc ctgcccgaaa atactggaca cataaatctgttatatcttt 8569 aacagattaa tgtgcagtca gcaacagatt aggcgatcta tctttaggaacgaattaaaa 8629 gaaaattaaa gttcgaatta caaacttgaa tagttcgttc agggattaaagattaaagat 8689 aaaaactaaa agatcaaact ttatctttga gatctttaag tgcagattttcaggagaatg 8749 atagatctta tccagcgcaa gttgttgcag cccagatacg cacactgctatataaacatg 8809 aaggctgcac gagttttcta ccaagtccga gattgaagag ttattttgtgagttttggga 8869 cttgagtgtt ttgtgagcca ccttgatgtt accctaacat caagtgttggacctgagtgt 8929 gtagagttga tctctattgt tcagagagca atctctggtg tgtctttgatttatttgtaa 8989 acacgggaga gtgattgaga gggagtgaga ggggttctca tatctaagagtggctcttag 9049 gtagaggttg catgggtagt ggttaggtga gaaggttgta aacagtggctgttagatctt 9109 cgaactaaca ctattttagt ggatttcctc cctggcttgg tagcccccagatgtaggtga 9169 cgttgcacca aactgggtta acaattctct tgtgttattt acttgtttaatctgttcata 9229 ctgtcaaata taatctgcat gttctgaagc gtgatgtcgt gacatccggtacgacatctg 9289 tcattggtat cagaatttca cacatatatc tttgatacat gtatacgtctttgtgagagc 9349 tatagtaat 9358 91 1576 PRT Glycine max misc_featureSoybean retroelement SIRE1 8 91 Met Asn Met Glu Lys Glu Gly Gly Pro ValAsn Arg Pro Pro Ile Leu 1 5 10 15 Asp Gly Thr Asn Tyr Glu Tyr Trp LysAla Arg Met Val Ala Phe Leu 20 25 30 Lys Ser Leu Asp Ser Arg Thr Trp LysAla Val Ile Lys Gly Trp Glu 35 40 45 His Pro Lys Met Leu Asp Thr Glu GlyLys Pro Thr Asn Glu Leu Lys 50 55 60 Pro Glu Glu Asp Trp Thr Lys Glu GluAsp Glu Leu Ala Leu Gly Asn 65 70 75 80 Ser Lys Ala Leu Asn Ala Leu PheAsn Gly Val Asp Lys Asn Ile Phe 85 90 95 Arg Leu Ile Asn Thr Cys Thr ValAla Lys Asp Ala Cys Gly Glu Ile 100 105 110 Leu Lys Thr Thr His Glu GlyThr Ser Lys Val Lys Met Ser Arg Leu 115 120 125 Gln Leu Leu Ala Thr LysPhe Glu Asn Leu Lys Met Lys Glu Glu Glu 130 135 140 Cys Ile His Asp PheHis Met Asn Ile Leu Glu Ile Ala Asn Ala Cys 145 150 155 160 Thr Ala LeuGly Glu Arg Met Thr Asp Glu Lys Leu Val Arg Lys Ile 165 170 175 Leu ArgSer Leu Pro Lys Arg Phe Asp Met Lys Val Thr Ala Ile Glu 180 185 190 GluAla Gln Asp Ile Cys Asn Met Arg Val Asp Glu Leu Ile Gly Ser 195 200 205Leu Gln Thr Phe Glu Leu Gly Leu Ser Asp Arg Asn Glu Lys Lys Ser 210 215220 Lys Asn Leu Ala Phe Val Ser Asn Asp Glu Gly Glu Glu Asp Glu Tyr 225230 235 240 Asp Leu Asp Thr Asp Glu Gly Leu Thr Asn Ala Val Gly Leu LeuGly 245 250 255 Lys Gln Phe Asn Lys Val Leu Asn Arg Met Asp Arg Arg GlnLys Pro 260 265 270 His Val Arg Asn Ile Pro Phe Asp Ile Arg Lys Gly SerGlu Tyr His 275 280 285 Lys Lys Ser Asp Glu Lys Pro Ser His Ser Lys GlyIle Gln Cys His 290 295 300 Gly Cys Glu Gly Tyr Gly His Ile Lys Ala GluCys Pro Thr His Leu 305 310 315 320 Lys Lys Gln Arg Lys Gly Leu Ser ValCys Arg Ser Asp Asp Thr Glu 325 330 335 Ser Glu Gln Glu Ser Asp Ser AspArg Asp Val Asn Ala Leu Thr Gly 340 345 350 Arg Phe Glu Ser Asp Glu AspSer Ser Asp Ile Glu Ile Thr Phe Asp 355 360 365 Glu Leu Ala Ile Ser TyrArg Lys Leu Cys Ile Lys Ser Glu Lys Ile 370 375 380 Leu Gln Gln Glu AlaGln Leu Lys Lys Val Ile Ala Asn Leu Glu Ala 385 390 395 400 Glu Lys GluAla His Glu Glu Glu Ile Ser Glu Leu Lys Gly Glu Val 405 410 415 Gly PheLeu Asn Ser Lys Leu Glu Asn Met Thr Lys Ser Ile Lys Met 420 425 430 LeuAsn Lys Gly Ser Asp Met Leu Asp Glu Val Leu Gln Leu Gly Lys 435 440 445Asn Val Gly Asn Gln Arg Gly Leu Gly Phe Asn His Lys Ser Ala Cys 450 455460 Arg Ile Thr Met Thr Glu Phe Val Pro Ala Lys Asn Ser Thr Gly Ala 465470 475 480 Thr Met Ser Gln His Arg Ser Arg His His Gly Thr Gln Gln LysLys 485 490 495 Ser Lys Arg Lys Lys Trp Arg Cys His Tyr Cys Gly Lys TyrGly His 500 505 510 Ile Lys Pro Phe Cys Tyr His Leu His Gly His Pro HisHis Gly Thr 515 520 525 Gln Ser Ser Ser Ser Gly Arg Lys Met Met Trp ValPro Lys His Lys 530 535 540 Ile Val Ser Leu Val Val His Thr Ser Leu ArgAla Ser Ala Lys Glu 545 550 555 560 Asp Trp Tyr Leu Asp Ser Gly Cys SerArg His Met Thr Gly Val Lys 565 570 575 Glu Phe Leu Val Asn Ile Glu ProCys Ser Thr Ser Tyr Val Thr Phe 580 585 590 Gly Asp Gly Ser Lys Gly LysIle Thr Gly Met Gly Lys Leu Val His 595 600 605 Asp Gly Leu Pro Ser LeuAsn Lys Val Leu Leu Val Lys Gly Leu Thr 610 615 620 Ala Asn Leu Ile SerIle Ser Gln Leu Cys Asp Glu Gly Phe Asn Val 625 630 635 640 Asn Phe ThrLys Ser Glu Cys Leu Val Thr Asn Glu Lys Ser Glu Val 645 650 655 Leu MetLys Gly Ser Arg Ser Lys Asp Asn Cys Tyr Leu Trp Thr Pro 660 665 670 GlnGlu Thr Ser Tyr Ser Ser Thr Cys Leu Ser Ser Lys Glu Asp Glu 675 680 685Val Lys Ile Trp His Gln Arg Phe Gly His Leu His Leu Arg Gly Met 690 695700 Lys Lys Ile Ile Asp Lys Gly Ala Val Arg Gly Ile Pro Asn Leu Lys 705710 715 720 Ile Glu Glu Gly Arg Ile Cys Gly Glu Cys Gln Ile Gly Lys GlnVal 725 730 735 Lys Met Ser His Gln Lys Leu Gln His Gln Thr Thr Ser MetVal Leu 740 745 750 Glu Leu Leu His Met Asp Leu Met Gly Pro Met Gln ValGlu Ser Leu 755 760 765 Gly Gly Lys Arg Tyr Ala Tyr Val Val Val Asp AspPhe Ser Arg Phe 770 775 780 Thr Trp Val Asn Phe Ile Arg Glu Lys Ser AspThr Phe Glu Val Phe 785 790 795 800 Lys Glu Leu Ser Leu Arg Leu Gln ArgGlu Lys Asp Cys Val Ile Lys 805 810 815 Arg Ile Arg Ser Asp His Gly ArgGlu Phe Glu Asn Ser Lys Phe Thr 820 825 830 Glu Phe Cys Thr Ser Glu GlyIle Thr His Glu Phe Ser Ala Ala Ile 835 840 845 Thr Pro Gln Gln Asn GlyIle Val Glu Arg Lys Asn Arg Thr Leu Gln 850 855 860 Glu Ala Thr Arg ValMet Leu His Ala Lys Glu Leu Pro Tyr Asn Leu 865 870 875 880 Trp Ala GluAla Met Asn Thr Ala Cys Tyr Ile His Asn Arg Val Thr 885 890 895 Leu ArgArg Gly Thr Pro Thr Thr Leu Tyr Glu Ile Trp Lys Gly Arg 900 905 910 LysPro Thr Val Lys His Phe His Ile Phe Gly Ser Pro Cys Tyr Ile 915 920 925Leu Ala Asp Arg Glu Gln Arg Arg Lys Met Asp Pro Lys Ser Asp Ala 930 935940 Gly Ile Phe Leu Gly Tyr Ser Thr Asn Ser Arg Ala Tyr Arg Val Phe 945950 955 960 Asn Ser Arg Thr Arg Thr Val Met Glu Ser Ile Asn Val Val ValAsp 965 970 975 Asp Leu Thr Pro Ala Arg Lys Lys Asp Val Glu Glu Asp ValArg Thr 980 985 990 Ser Glu Asp Asn Val Ala Asp Thr Ala Lys Ser Ala GluAsn Ala Glu 995 1000 1005 Lys Ser Asp Ser Thr Thr Asp Glu Pro Asn IleAsn Gln Pro Asp 1010 1015 1020 Lys Ser Pro Phe Ile Arg Ile Gln Lys MetGln Pro Lys Glu Leu 1025 1030 1035 Ile Ile Gly Asp Pro Asn Arg Gly ValThr Thr Arg Ser Arg Glu 1040 1045 1050 Ile Glu Ile Val Ser Asn Ser CysPhe Val Ser Lys Ile Glu Pro 1055 1060 1065 Lys Asn Val Lys Glu Ala LeuThr Asp Glu Phe Trp Ile Asn Ala 1070 1075 1080 Met Gln Glu Glu Leu GluGln Phe Lys Arg Asn Glu Val Trp Glu 1085 1090 1095 Leu Val Pro Arg ProGlu Gly Thr Asn Val Ile Gly Thr Lys Trp 1100 1105 1110 Ile Phe Lys AsnLys Thr Asn Glu Glu Gly Val Ile Thr Arg Asn 1115 1120 1125 Lys Ala ArgLeu Val Ala Gln Gly Tyr Thr Gln Ile Glu Gly Val 1130 1135 1140 Asp PheAsp Glu Thr Phe Ala Pro Val Ala Arg Leu Glu Ser Ile 1145 1150 1155 ArgLeu Leu Leu Gly Val Ala Cys Ile Leu Lys Phe Lys Leu Tyr 1160 1165 1170Gln Met Asp Val Lys Ser Ala Phe Leu Asn Gly Tyr Leu Asn Glu 1175 11801185 Glu Ala Tyr Val Glu Gln Pro Lys Gly Phe Val Asp Pro Thr His 11901195 1200 Leu Asp His Val Tyr Arg Leu Lys Lys Ala Leu Tyr Gly Leu Lys1205 1210 1215 Gln Ala Pro Arg Ala Trp Tyr Glu Arg Leu Thr Glu Phe LeuThr 1220 1225 1230 Gln Gln Gly Tyr Arg Lys Gly Gly Ile Asp Lys Thr LeuPhe Val 1235 1240 1245 Lys Gln Asp Ala Glu Asn Leu Met Ile Ala Gln IleTyr Val Asp 1250 1255 1260 Asp Ile Val Phe Gly Gly Met Ser Asn Glu MetLeu Arg His Phe 1265 1270 1275 Val Pro Gln Met Gln Ser Glu Phe Glu MetSer Leu Val Gly Glu 1280 1285 1290 Leu His Tyr Phe Leu Gly Leu Gln ValLys Gln Met Glu Asp Ser 1295 1300 1305 Ile Phe Leu Ser Gln Ser Lys TyrAla Lys Asn Ile Val Lys Lys 1310 1315 1320 Phe Gly Met Glu Asn Ala SerHis Lys Arg Thr Pro Ala Pro Thr 1325 1330 1335 His Leu Lys Leu Ser LysAsp Glu Ala Gly Thr Ser Val Asp Gln 1340 1345 1350 Asn Leu Tyr Arg SerMet Ile Gly Ser Leu Leu Tyr Leu Thr Ala 1355 1360 1365 Ser Arg Pro AspIle Thr Phe Ala Val Gly Val Cys Ala Arg Tyr 1370 1375 1380 Gln Ala AsnPro Lys Ile Ser His Leu Asn Gln Val Lys Arg Ile 1385 1390 1395 Leu LysTyr Val Asn Gly Thr Ser Asp Tyr Gly Ile Met Tyr Cys 1400 1405 1410 HisCys Ser Asp Ser Met Leu Val Gly Tyr Cys Asp Ala Asp Trp 1415 1420 1425Ala Gly Ser Ala Asp Asp Arg Lys Cys Thr Ser Gly Gly Cys Phe 1430 14351440 Tyr Leu Gly Thr Asn Leu Ile Ser Trp Phe Ser Lys Lys Gln Asn 14451450 1455 Cys Val Ser Leu Ser Thr Ala Glu Ala Glu Tyr Ile Ala Ala Gly1460 1465 1470 Ser Ser Cys Ser Gln Leu Val Trp Met Lys Gln Met Leu LysGlu 1475 1480 1485 Tyr Asn Val Glu Gln Asp Val Met Thr Leu Tyr Cys AspAsn Met 1490 1495 1500 Ser Ala Ile Asn Ile Ser Lys Asn Pro Val Gln HisAsn Arg Thr 1505 1510 1515 Lys His Ile Asp Ile Arg His His Tyr Ile ArgAsp Leu Val Asp 1520 1525 1530 Asp Lys Ile Ile Thr Leu Glu His Val AspThr Glu Glu Gln Val 1535 1540 1545 Ala Asp Ile Phe Thr Lys Ala Leu AspAla Asn Gln Phe Glu Lys 1550 1555 1560 Leu Arg Gly Lys Leu Gly Thr CysLeu Leu Glu Asp Leu 1565 1570 1575 92 656 PRT Glycine max misc_featureSoybean retroelement SIRE1 8 92 Gln Leu Leu Leu Ser Glu Arg Val Gln ThrLeu Ile Ala Arg Ser Leu 1 5 10 15 Leu Gly Gln Asn Lys Phe Asp Arg CysPhe Thr Arg Pro Ser Thr Phe 20 25 30 Leu Ile Gln Thr Tyr Ile Phe Val ValIle Ser Phe Ser Phe Thr Asn 35 40 45 Thr Ser Gln Ile Phe Thr Lys Pro PheGln Lys Leu Cys Phe Ser Met 50 55 60 Ala Thr Ser Pro Lys Glu Thr Ser SerPro Val Ser Pro Ser Val Pro 65 70 75 80 Ser Pro Pro Ser Ser Thr Lys AlaPro Ser Asn Gln Glu Gln Pro Glu 85 90 95 Phe Asn Ile Gln Pro Ile Gln MetIle Pro Gly Pro Ala Pro Val Pro 100 105 110 Glu Lys Leu Val Pro Lys ArgGln Gln Gly Val Lys Ile Ser Glu Asn 115 120 125 Pro Ser Leu Ala Thr SerPro Arg Glu Val Asp Thr Glu Met Asp Lys 130 135 140 Lys Ile Arg Ser IleVal Ser Ser Ile Leu Lys Asn Ala Ser Val Pro 145 150 155 160 Asp Ala AspLys Asp Val Pro Thr Ser Ser Thr Pro Asn Ala Glu Val 165 170 175 Leu SerSer Ser Ser Lys Glu Lys Ser Thr Glu Glu Glu Asp Gln Ala 180 185 190 ThrGlu Glu Thr Pro Ala Pro Arg Ala Pro Glu Pro Ala Pro Gly Asp 195 200 205Leu Ile Asp Leu Glu Glu Val Glu Ser Asp Glu Glu Pro Ile Val Lys 210 215220 Lys Leu Ala Leu Gly Ile Ala Glu Arg Leu Gln Ser Arg Lys Gly Lys 225230 235 240 Thr Pro Ile Thr Arg Ser Gly Arg Ile Lys Thr Ile Ala Gln LysLys 245 250 255 Ser Thr Pro Ile Thr Pro Thr Thr Ser Arg Trp Ser Lys ValAla Ile 260 265 270 Pro Ser Lys Lys Arg Lys Glu Ile Ser Ser Ser Asp SerAsp Asp Asp 275 280 285 Val Glu Leu Asp Val Pro Asp Ile Lys Arg Ala LysLys Ser Gly Lys 290 295 300 Lys Val Pro Gly Asn Val Pro Asp Ala Pro LeuAsp Asn Ile Ser Phe 305 310 315 320 His Ser Ile Gly Asn Val Glu Arg TrpLys Phe Val Tyr Gln Arg Arg 325 330 335 Leu Ala Leu Glu Arg Glu Leu GlyArg Asp Ala Leu Asp Cys Lys Glu 340 345 350 Ile Met Asp Leu Ile Lys AlaAla Gly Leu Leu Lys Thr Val Thr Lys 355 360 365 Leu Gly Asp Cys Tyr GluSer Leu Val Arg Glu Phe Ile Val Asn Ile 370 375 380 Pro Ser Asp Ile ThrAsn Arg Lys Ser Asp Glu Tyr Gln Thr Val Phe 385 390 395 400 Val Arg GlyLys Gly Ile Arg Phe Ser Pro Ala Val Ile Asn Lys Tyr 405 410 415 Leu GlyArg Pro Thr Glu Gly Val Val Asp Ile Ala Val Ser Glu His 420 425 430 GlnIle Ala Lys Glu Ile Thr Ala Lys Gln Val Gln His Trp Pro Lys 435 440 445Lys Gly Lys Leu Ser Ala Gly Lys Leu Ser Val Lys Tyr Ala Ile Leu 450 455460 His Arg Ile Gly Thr Ala Asn Trp Val Pro Thr Asn His Thr Ser Thr 465470 475 480 Val Ala Thr Gly Leu Gly Lys Phe Leu Tyr Ala Val Gly Thr LysSer 485 490 495 Lys Phe Asn Phe Gly Asn Tyr Ile Phe Asp Gln Thr Val LysHis Ser 500 505 510 Glu Ser Phe Ala Val Lys Leu Pro Ile Ala Phe Pro ThrVal Leu Cys 515 520 525 Gly Ile Met Leu Ser Gln His Pro Asn Ile Leu AsnAsn Ile Asp Ser 530 535 540 Val Lys Lys Arg Glu Ser Ala Leu Ser Leu HisTyr Lys Leu Phe Glu 545 550 555 560 Gly Thr His Val Pro Asp Ile Val SerThr Ser Gly Lys Ala Ala Ala 565 570 575 Ser Gly Ala Val Thr Lys Asp AlaLeu Ile Ala Glu Leu Lys Asp Thr 580 585 590 Cys Lys Val Leu Glu Ala ThrIle Lys Ala Thr Thr Glu Lys Lys Met 595 600 605 Glu Leu Glu Arg Leu IleLys Arg Leu Ser Asp Ser Gly Ile Asp Asp 610 615 620 Gly Glu Ala Ala GluGlu Glu Glu Glu Ala Ala Glu Glu Glu Glu Asp 625 630 635 640 Ala Ala GluAsp Thr Glu Ser Asp Asp Asp Asp Ser Asp Ala Thr Pro 645 650 655 93 9399DNA Glycine max misc_feature Soybean retroelement SIRE1 9 93 caagacaataaagagctctc tacatttgtg ttagtgctta gcactactga gtttaaaaag 60 gcttggctaagattttgtta aaacataagc acttagacaa tgaaggaaag ctggagttgc 120 tgcacatgatgtccaacgtt atgtcaagga ataagatcgg gctgcataat gcacaaggca 180 agataaagtgtcaagtgatg aattgaagtt gaaggatcca cgatgtcgga tacaatgtcc 240 tgacatcctgctcgagaata ctggaagtgc tgtacaatgc aagataaaag tcaagtgaag 300 cattgaagctgcaggatcca agatgtcgga tacgatgtcc tgacatctgg cccgataata 360 ctggacatataaatctgtta tatctttaac agattattgt gcagttagca agagattaga 420 agatctatctttaggaacga attaaaagat cattaaagtt cgaatttcaa agtagaagag 480 ttcgttcagggattaaagat taaagattaa agattcaaac taaaagatca aaagttatct 540 tttagttctttaactgcaga tttttcagaa gaagatagat ctcctccagc atcaagaact 600 tgcagcccagaatcgtacac ggctatataa tcatggaggc tgcacgagtt ctgtaccgag 660 tccgggattaaagagttatt ttgtgagttt tgggacttga gtgttttgtg agccaccttg 720 atggtatactaacatcaagt gttggacctg agtgtgtaga gttgatctct attgtgtagg 780 gttgatcccttttgtacaga gttgatctct gatgtgtctt tgaattaatt gtaaacacga 840 gagtgtgagtgagagggagt gagcagaggt tctcatatct aagattgggt cttaggtaga 900 gatcgcacgggtagtggtta ggtgagaagg ttgtaaacag gggttgttag accttgaact 960 aacactattgagagtggatt tcctccctgg cttggtagcc cccagatgta ggtgaggttg 1020 caccgaactgggtaaacaat tctcttgtgt tatttacttg tttaatctgt tcatacggac 1080 acacataaactgcatgttct gaagcatgat gtcgtgacat cctgtacgac atctgtcccc 1140 tggtatcagaatttcaattg gtatcagagc caacactcga aatcacagag tgagatctgg 1200 ggagataaattctg atg aac atg gag aaa gaa gga gga cca gtg aac aga 1250 Met Asn MetGlu Lys Glu Gly Gly Pro Val Asn Arg 1 5 10 cca cca att ctt gat gga agcaac tat gaa tac tgg aaa gca aga atg 1298 Pro Pro Ile Leu Asp Gly Ser AsnTyr Glu Tyr Trp Lys Ala Arg Met 15 20 25 gtg gcc ttc ctc aaa tca ctg gatagc aga acc tgg aaa gct gtc atc 1346 Val Ala Phe Leu Lys Ser Leu Asp SerArg Thr Trp Lys Ala Val Ile 30 35 40 aaa ggc tgg gaa cat ccc aag atg ctggac aca gaa gga aag ccc act 1394 Lys Gly Trp Glu His Pro Lys Met Leu AspThr Glu Gly Lys Pro Thr 45 50 55 60 gat gaa ttg aag cca gaa gaa gac tggact aaa gaa gag gac gaa ttg 1442 Asp Glu Leu Lys Pro Glu Glu Asp Trp ThrLys Glu Glu Asp Glu Leu 65 70 75 gca ctt gga aac tcc aaa gct ttg aat gcacta ttc aat gga gtt gac 1490 Ala Leu Gly Asn Ser Lys Ala Leu Asn Ala LeuPhe Asn Gly Val Asp 80 85 90 aag aac atc ttc aga ctg atc aac act tgc acagtg gcc aaa gat gca 1538 Lys Asn Ile Phe Arg Leu Ile Asn Thr Cys Thr ValAla Lys Asp Ala 95 100 105 tgg gag atc ctg aaa atc act cat gaa gga acctcc aaa gtg aag atg 1586 Trp Glu Ile Leu Lys Ile Thr His Glu Gly Thr SerLys Val Lys Met 110 115 120 tcc aga ttg caa ctc ttg gct aca aaa ttc gaaaat ctg aag atg aag 1634 Ser Arg Leu Gln Leu Leu Ala Thr Lys Phe Glu AsnLeu Lys Met Lys 125 130 135 140 gag gaa gag tgt att cat gac ttc cac atgaac att ctt gaa att gcc 1682 Glu Glu Glu Cys Ile His Asp Phe His Met AsnIle Leu Glu Ile Ala 145 150 155 aat gct tgc act gcc ttg gga gag agg ataaca gat gaa aag ctg gtg 1730 Asn Ala Cys Thr Ala Leu Gly Glu Arg Ile ThrAsp Glu Lys Leu Val 160 165 170 aga aag atc ctc aga tcc ttg cct aag agattt gac atg aaa gtc act 1778 Arg Lys Ile Leu Arg Ser Leu Pro Lys Arg PheAsp Met Lys Val Thr 175 180 185 gca ata gag gag gcc caa gac att tgc aacatg aga gtt gat gaa ctc 1826 Ala Ile Glu Glu Ala Gln Asp Ile Cys Asn MetArg Val Asp Glu Leu 190 195 200 att ggt tct ctt caa acc ttt gag cta ggactc tcg gat agg gct gaa 1874 Ile Gly Ser Leu Gln Thr Phe Glu Leu Gly LeuSer Asp Arg Ala Glu 205 210 215 220 aag aag agc aag aat cta gct ttc gtgtcc aat gat gaa gga gaa gaa 1922 Lys Lys Ser Lys Asn Leu Ala Phe Val SerAsn Asp Glu Gly Glu Glu 225 230 235 gat gag tat gac ctg gat act gat gaaggt ctg aca aat gca gtt gtg 1970 Asp Glu Tyr Asp Leu Asp Thr Asp Glu GlyLeu Thr Asn Ala Val Val 240 245 250 ctc ctt gga aag cag ttc aac aaa gtgctg aac aga atg gac aag agg 2018 Leu Leu Gly Lys Gln Phe Asn Lys Val LeuAsn Arg Met Asp Lys Arg 255 260 265 cag aaa cca cat gtc cag aac atc cctttc gac atc agg aaa ggc agt 2066 Gln Lys Pro His Val Gln Asn Ile Pro PheAsp Ile Arg Lys Gly Ser 270 275 280 aaa tac cag aaa aga tca gat gta aagccc agt cac agc aaa gga att 2114 Lys Tyr Gln Lys Arg Ser Asp Val Lys ProSer His Ser Lys Gly Ile 285 290 295 300 caa tgc cat ggg tgt gaa ggc tatgga cac atc ata gct gaa tgt ccc 2162 Gln Cys His Gly Cys Glu Gly Tyr GlyHis Ile Ile Ala Glu Cys Pro 305 310 315 act cat ctc aag aag cac agg aaagga ctc tct gta tgt caa tct gat 2210 Thr His Leu Lys Lys His Arg Lys GlyLeu Ser Val Cys Gln Ser Asp 320 325 330 aca gag agt gaa caa gaa agt gattct gac aga gat gtg aat gca ctc 2258 Thr Glu Ser Glu Gln Glu Ser Asp SerAsp Arg Asp Val Asn Ala Leu 335 340 345 act ggg ata ttt gaa act gct gaagat tca agt gat aca gac agt gaa 2306 Thr Gly Ile Phe Glu Thr Ala Glu AspSer Ser Asp Thr Asp Ser Glu 350 355 360 atc act ttt gat gag ctt gct gcatcc tat aga aaa cta tgc atc aaa 2354 Ile Thr Phe Asp Glu Leu Ala Ala SerTyr Arg Lys Leu Cys Ile Lys 365 370 375 380 agt gag aag atc ctt cag caagaa gca caa ctg aag aag gtc att gca 2402 Ser Glu Lys Ile Leu Gln Gln GluAla Gln Leu Lys Lys Val Ile Ala 385 390 395 gat ctg gag gct gag aag gaggca cat gaa gag gag att tct gaa ctt 2450 Asp Leu Glu Ala Glu Lys Glu AlaHis Glu Glu Glu Ile Ser Glu Leu 400 405 410 aaa gga gaa gtt ggt ttt ctgaac tcc aag ctg gaa acc atg aaa aaa 2498 Lys Gly Glu Val Gly Phe Leu AsnSer Lys Leu Glu Thr Met Lys Lys 415 420 425 tca ata aag atg ctg aat aaaggc tca gat acg ctt gat gag gtg ctg 2546 Ser Ile Lys Met Leu Asn Lys GlySer Asp Thr Leu Asp Glu Val Leu 430 435 440 ctg ctt ggt aag aat gct ggaaac cag aga gga ctt gga ttt aat cct 2594 Leu Leu Gly Lys Asn Ala Gly AsnGln Arg Gly Leu Gly Phe Asn Pro 445 450 455 460 aag ttt gct ggc aga acaacc atg aca gaa ttt gtt cct gcc aaa aac 2642 Lys Phe Ala Gly Arg Thr ThrMet Thr Glu Phe Val Pro Ala Lys Asn 465 470 475 agg act gga acc acg atgtca caa cat ctg tct cga cat cat gga acg 2690 Arg Thr Gly Thr Thr Met SerGln His Leu Ser Arg His His Gly Thr 480 485 490 cag cag aaa aag agc aaaaga aag aag tgg agg tgt cac tac tgt ggc 2738 Gln Gln Lys Lys Ser Lys ArgLys Lys Trp Arg Cys His Tyr Cys Gly 495 500 505 aag tat ggt cac ata aagccc ttt tgc tat cat cta cat ggc cat cca 2786 Lys Tyr Gly His Ile Lys ProPhe Cys Tyr His Leu His Gly His Pro 510 515 520 cat cat gga act caa agcagc aac agc aga aag aag atg atg tgg gtt 2834 His His Gly Thr Gln Ser SerAsn Ser Arg Lys Lys Met Met Trp Val 525 530 535 540 cca aaa cac aag gctgtc agt ctt gtt gtt cat act tca ctt aga gca 2882 Pro Lys His Lys Ala ValSer Leu Val Val His Thr Ser Leu Arg Ala 545 550 555 tca gct aag gaa gattgg tac cta gat agc ggc tgt tcc aga cac atg 2930 Ser Ala Lys Glu Asp TrpTyr Leu Asp Ser Gly Cys Ser Arg His Met 560 565 570 aca gga gtc aaa gaattc ctg ctg aac att gag ccc tgc tcc act agt 2978 Thr Gly Val Lys Glu PheLeu Leu Asn Ile Glu Pro Cys Ser Thr Ser 575 580 585 tat gtg aca ttt ggagat ggc tct aaa gga aag atc att gga atg gga 3026 Tyr Val Thr Phe Gly AspGly Ser Lys Gly Lys Ile Ile Gly Met Gly 590 595 600 aag cta gtt cat gatgga ctt cct agt ctg aac aaa gta ctg ctg gtg 3074 Lys Leu Val His Asp GlyLeu Pro Ser Leu Asn Lys Val Leu Leu Val 605 610 615 620 aag gga ctg actgca aac ttg att agc atc agt cag ctg tgt gat gaa 3122 Lys Gly Leu Thr AlaAsn Leu Ile Ser Ile Ser Gln Leu Cys Asp Glu 625 630 635 gga ttc aat gtaaac ttc aca aag tca gaa tgc ttg gtg aca aat gag 3170 Gly Phe Asn Val AsnPhe Thr Lys Ser Glu Cys Leu Val Thr Asn Glu 640 645 650 aag agt gaa gttcta atg aag ggc agc aga tca aag gac aat tgt tac 3218 Lys Ser Glu Val LeuMet Lys Gly Ser Arg Ser Lys Asp Asn Cys Tyr 655 660 665 cta tgg aca ccccaa gaa acc agc tac tcc tcc aca tgt cta tcc tcc 3266 Leu Trp Thr Pro GlnGlu Thr Ser Tyr Ser Ser Thr Cys Leu Ser Ser 670 675 680 aaa gaa gat gaagtc aga ata tgg cat caa aga ttt gga cat ctg cac 3314 Lys Glu Asp Glu ValArg Ile Trp His Gln Arg Phe Gly His Leu His 685 690 695 700 tta aga ggcatg aag aaa atc att gac aaa ggt gct gtt aga ggc atc 3362 Leu Arg Gly MetLys Lys Ile Ile Asp Lys Gly Ala Val Arg Gly Ile 705 710 715 ccc aat ctgaaa ata gaa gaa ggc aga atc tgt ggt gaa tgt cag att 3410 Pro Asn Leu LysIle Glu Glu Gly Arg Ile Cys Gly Glu Cys Gln Ile 720 725 730 gga aag caagtc aag atg tcc cac cag aag ctt cga cat cag acc act 3458 Gly Lys Gln ValLys Met Ser His Gln Lys Leu Arg His Gln Thr Thr 735 740 745 tcc agg gtgctg gaa cta ctt cac atg gat ttg atg ggg cct atg cag 3506 Ser Arg Val LeuGlu Leu Leu His Met Asp Leu Met Gly Pro Met Gln 750 755 760 gtt gaa agtctt gga gga aag agg tat gcc tat gtt gtt gtg gat gat 3554 Val Glu Ser LeuGly Gly Lys Arg Tyr Ala Tyr Val Val Val Asp Asp 765 770 775 780 ttc tccaga ttt acc tgg gta aat ttt atc aga gag aaa tca gaa acc 3602 Phe Ser ArgPhe Thr Trp Val Asn Phe Ile Arg Glu Lys Ser Glu Thr 785 790 795 ttt gaagta ttc aaa gag ttg agt cta aga ctt caa aga gag aaa gac 3650 Phe Glu ValPhe Lys Glu Leu Ser Leu Arg Leu Gln Arg Glu Lys Asp 800 805 810 tgt gtcatc aag aga atc agg agt gac cat ggc aga gaa ttt gaa aac 3698 Cys Val IleLys Arg Ile Arg Ser Asp His Gly Arg Glu Phe Glu Asn 815 820 825 agc aggttc act gaa ttc tgc aca tct gaa ggc atc act cat gag ttc 3746 Ser Arg PheThr Glu Phe Cys Thr Ser Glu Gly Ile Thr His Glu Phe 830 835 840 tct gcagcc att aca cca caa cag aat ggg ata gtt gag agg aaa aac 3794 Ser Ala AlaIle Thr Pro Gln Gln Asn Gly Ile Val Glu Arg Lys Asn 845 850 855 860 aggact ttg caa gag gct gct cgg gtc atg ctt cat gcc aaa gaa ctt 3842 Arg ThrLeu Gln Glu Ala Ala Arg Val Met Leu His Ala Lys Glu Leu 865 870 875 ccctat aat ctc tgg gct gaa gcc atg aac aca gca tgc tac atc cac 3890 Pro TyrAsn Leu Trp Ala Glu Ala Met Asn Thr Ala Cys Tyr Ile His 880 885 890 aacaga gtc aca ctg aga aga gga act cca acc acc ctg tat gaa atc 3938 Asn ArgVal Thr Leu Arg Arg Gly Thr Pro Thr Thr Leu Tyr Glu Ile 895 900 905 tggaaa ggg agg aag cca tct gtc aag cac ttc cac atc ttt gga agt 3986 Trp LysGly Arg Lys Pro Ser Val Lys His Phe His Ile Phe Gly Ser 910 915 920 ccatgt tac atc ttg gca gat aga gag caa aga aga aag atg gat ccc 4034 Pro CysTyr Ile Leu Ala Asp Arg Glu Gln Arg Arg Lys Met Asp Pro 925 930 935 940aag agt gat gca gga ata ttc ctg gga tac tct aca aac agc aga gca 4082 LysSer Asp Ala Gly Ile Phe Leu Gly Tyr Ser Thr Asn Ser Arg Ala 945 950 955tat aga gta ttc aat tcc aga acc aga aca gtg atg gaa tcc atc aat 4130 TyrArg Val Phe Asn Ser Arg Thr Arg Thr Val Met Glu Ser Ile Asn 960 965 970gtg gtt gtt gat gat ctg tct cca gca aga aag aag gat gtc gaa gaa 4178 ValVal Val Asp Asp Leu Ser Pro Ala Arg Lys Lys Asp Val Glu Glu 975 980 985gat gtc aga aca ttg gga gac aat gta gca gat gca gct aaa agt gga 4226 AspVal Arg Thr Leu Gly Asp Asn Val Ala Asp Ala Ala Lys Ser Gly 990 995 1000gaa aat gca gaa aac tct gat tct gct aca gat gaa tca aac atc 4271 Glu AsnAla Glu Asn Ser Asp Ser Ala Thr Asp Glu Ser Asn Ile 1005 1010 1015 aaccaa ccc gac aag aga tcc tcc act aga atc cag aag atg cac 4316 Asn Gln ProAsp Lys Arg Ser Ser Thr Arg Ile Gln Lys Met His 1020 1025 1030 ccc aaggag ctg att ata gga gat cca aac aga ggg gtc act aca 4361 Pro Lys Glu LeuIle Ile Gly Asp Pro Asn Arg Gly Val Thr Thr 1035 1040 1045 aga tca agggag gtt gag atc gtc tca aac tca tgt ttt gtc tcc 4406 Arg Ser Arg Glu ValGlu Ile Val Ser Asn Ser Cys Phe Val Ser 1050 1055 1060 aaa att gag cccaag aat gtg aaa gag gca ctg aca gat gag ttc 4451 Lys Ile Glu Pro Lys AsnVal Lys Glu Ala Leu Thr Asp Glu Phe 1065 1070 1075 tgg atc aat gct atgcaa gaa gaa ttg gag caa ttc aaa agg aat 4496 Trp Ile Asn Ala Met Gln GluGlu Leu Glu Gln Phe Lys Arg Asn 1080 1085 1090 gaa gtc tgg gag cta gttcct agg cct gag gga act aat gtg att 4541 Glu Val Trp Glu Leu Val Pro ArgPro Glu Gly Thr Asn Val Ile 1095 1100 1105 ggc acc aag tgg atc ttc aagaac aaa acc aat gaa gaa ggt gtc 4586 Gly Thr Lys Trp Ile Phe Lys Asn LysThr Asn Glu Glu Gly Val 1110 1115 1120 ata acc aga aac aag gcc aga ctggtt gct caa ggc tac act cag 4631 Ile Thr Arg Asn Lys Ala Arg Leu Val AlaGln Gly Tyr Thr Gln 1125 1130 1135 att gaa ggt gta gac ttt gac gag actttt gcc cca gtt gct aga 4676 Ile Glu Gly Val Asp Phe Asp Glu Thr Phe AlaPro Val Ala Arg 1140 1145 1150 ctt gag tcc atc aga tta tta ctt ggt gtagct tgt atc ctc aaa 4721 Leu Glu Ser Ile Arg Leu Leu Leu Gly Val Ala CysIle Leu Lys 1155 1160 1165 ttc aag ctg tac cag atg gat gtg aag agc gcattt ctg aat gga 4766 Phe Lys Leu Tyr Gln Met Asp Val Lys Ser Ala Phe LeuAsn Gly 1170 1175 1180 tac ctg aat gaa gaa gtc tat gtg gag cag cca aaggga ttt gca 4811 Tyr Leu Asn Glu Glu Val Tyr Val Glu Gln Pro Lys Gly PheAla 1185 1190 1195 gac ccg act cat cca gat cat gta tac agg ctc aag aaggct ctc 4856 Asp Pro Thr His Pro Asp His Val Tyr Arg Leu Lys Lys Ala Leu1200 1205 1210 tat gga ttg aag caa gct cca aga gct tgg tat gaa agg ctaaca 4901 Tyr Gly Leu Lys Gln Ala Pro Arg Ala Trp Tyr Glu Arg Leu Thr1215 1220 1225 gag ttc ctt act cag caa ggg tat agg aag gga gga att gacaag 4946 Glu Phe Leu Thr Gln Gln Gly Tyr Arg Lys Gly Gly Ile Asp Lys1230 1235 1240 acc ctc ttt gtc aaa caa gat gct gaa aac ttg atg att gcacag 4991 Thr Leu Phe Val Lys Gln Asp Ala Glu Asn Leu Met Ile Ala Gln1245 1250 1255 ata tat gtt gat gac att gtg ttt gga ggg atg tcg aat gagatg 5036 Ile Tyr Val Asp Asp Ile Val Phe Gly Gly Met Ser Asn Glu Met1260 1265 1270 ctt cga cat ttt gtt caa cag atg caa tct gaa ttt gag atgagt 5081 Leu Arg His Phe Val Gln Gln Met Gln Ser Glu Phe Glu Met Ser1275 1280 1285 ctt gtt gga gag ctg act tat ttt ctg gga ctt caa gtg aagcag 5126 Leu Val Gly Glu Leu Thr Tyr Phe Leu Gly Leu Gln Val Lys Gln1290 1295 1300 atg gag gac tcc ata ttc ctc tca caa agc agg tat gca aagaac 5171 Met Glu Asp Ser Ile Phe Leu Ser Gln Ser Arg Tyr Ala Lys Asn1305 1310 1315 att gtc aag aag ttt ggg atg gag aat gcc agt cat aaa aggaca 5216 Ile Val Lys Lys Phe Gly Met Glu Asn Ala Ser His Lys Arg Thr1320 1325 1330 cct gca cct act cac ttg aag ctg tca aag gat gaa gca ggcacc 5261 Pro Ala Pro Thr His Leu Lys Leu Ser Lys Asp Glu Ala Gly Thr1335 1340 1345 agt gtt gat caa agt ctg tac aga agc atg ata ggg agc ttacta 5306 Ser Val Asp Gln Ser Leu Tyr Arg Ser Met Ile Gly Ser Leu Leu1350 1355 1360 tat tta aca gct agc aga ccc gac atc acc tat gca gta ggtgtt 5351 Tyr Leu Thr Ala Ser Arg Pro Asp Ile Thr Tyr Ala Val Gly Val1365 1370 1375 tgt gca aga tat caa gcc aat ccg aag ata agt cac ttg actcaa 5396 Cys Ala Arg Tyr Gln Ala Asn Pro Lys Ile Ser His Leu Thr Gln1380 1385 1390 gta aag aga att ctg aaa tat gta aat ggc act agt gac tatggg 5441 Val Lys Arg Ile Leu Lys Tyr Val Asn Gly Thr Ser Asp Tyr Gly1395 1400 1405 att atg tac tgt cat tgt tca aat cca atg ctg gtt ggg tattgt 5486 Ile Met Tyr Cys His Cys Ser Asn Pro Met Leu Val Gly Tyr Cys1410 1415 1420 gat gct gat tgg gct gga agt gca gat gac aga aaa agc acttct 5531 Asp Ala Asp Trp Ala Gly Ser Ala Asp Asp Arg Lys Ser Thr Ser1425 1430 1435 ggt gga tgc ttc tat ttg gga aac aac ctt att tca tgg ttcagc 5576 Gly Gly Cys Phe Tyr Leu Gly Asn Asn Leu Ile Ser Trp Phe Ser1440 1445 1450 aag aag cag aac tgt gtg tcc cta tct aca gca gaa gcc gagtat 5621 Lys Lys Gln Asn Cys Val Ser Leu Ser Thr Ala Glu Ala Glu Tyr1455 1460 1465 att gca gca gga agc agc tgt tca cag cta gtt tgg atg aagcag 5666 Ile Ala Ala Gly Ser Ser Cys Ser Gln Leu Val Trp Met Lys Gln1470 1475 1480 atg ctg aag gag tac aat gtc gaa caa gat gtc atg aca ttgtac 5711 Met Leu Lys Glu Tyr Asn Val Glu Gln Asp Val Met Thr Leu Tyr1485 1490 1495 tgt gac aac atg agt gct att aat att tct aaa aat cct gttcaa 5756 Cys Asp Asn Met Ser Ala Ile Asn Ile Ser Lys Asn Pro Val Gln1500 1505 1510 cac agc aga acc aag cac att gac att aga cat cac tat atcaga 5801 His Ser Arg Thr Lys His Ile Asp Ile Arg His His Tyr Ile Arg1515 1520 1525 gat ctt gtt gat gat aaa gtg atc aca ctg aag cat gtt gacact 5846 Asp Leu Val Asp Asp Lys Val Ile Thr Leu Lys His Val Asp Thr1530 1535 1540 gag gaa caa ata gca gat att ttc aca aag gct ttg gat gcaaat 5891 Glu Glu Gln Ile Ala Asp Ile Phe Thr Lys Ala Leu Asp Ala Asn1545 1550 1555 cag ttt gaa aaa ctg agg ggc aag ctg ggc att tgt ttg ctagaa 5936 Gln Phe Glu Lys Leu Arg Gly Lys Leu Gly Ile Cys Leu Leu Glu1560 1565 1570 gaa tta tag caa cta cag caa tct gaa cgt gcc caa acg aatcac 5981 Glu Leu Gln Leu Gln Gln Ser Glu Arg Ala Gln Thr Asn His 15751580 1585 tta aca tta ata gca cgt tca cca caa agc aaa ttc gac cgt tgc6026 Leu Thr Leu Ile Ala Arg Ser Pro Gln Ser Lys Phe Asp Arg Cys 15901595 1600 ctc aca cgc ccc tct aca ttc ttc att caa att tat atc tgc ttg6071 Leu Thr Arg Pro Ser Thr Phe Phe Ile Gln Ile Tyr Ile Cys Leu 16051610 1615 gca ttc gtg ttt tca cca gca ttt tcc aat aat tct ctg aga ttt6116 Ala Phe Val Phe Ser Pro Ala Phe Ser Asn Asn Ser Leu Arg Phe 16201625 1630 acg aaa tca ttc caa acg ctc tgt ttt tcc atg gct acc tca cca6161 Thr Lys Ser Phe Gln Thr Leu Cys Phe Ser Met Ala Thr Ser Pro 16351640 1645 aaa gaa act gca gct tct ggt tca cca tct gtc ccg tca tct cca6206 Lys Glu Thr Ala Ala Ser Gly Ser Pro Ser Val Pro Ser Ser Pro 16501655 1660 cac cag gaa caa cct gaa ttc aac atc caa ccc atc caa att att6251 His Gln Glu Gln Pro Glu Phe Asn Ile Gln Pro Ile Gln Ile Ile 16651670 1675 cct ggt caa gcc tct gtc cct gag aaa ctg gtt ccc aga aga cca6296 Pro Gly Gln Ala Ser Val Pro Glu Lys Leu Val Pro Arg Arg Pro 16801685 1690 cag gga gtg aag att gct gaa aac cct agc cct gca acg agt cct6341 Gln Gly Val Lys Ile Ala Glu Asn Pro Ser Pro Ala Thr Ser Pro 16951700 1705 agg gaa gta gac acg gag atg gac aag aaa ata cgc agc att gtg6386 Arg Glu Val Asp Thr Glu Met Asp Lys Lys Ile Arg Ser Ile Val 17101715 1720 agt agc att ttg aaa gac gcc tct gtg cct gaa gct gat gaa gat6431 Ser Ser Ile Leu Lys Asp Ala Ser Val Pro Glu Ala Asp Glu Asp 17251730 1735 gtc cca aca tcg tcc aac cca gat gtt tcg gtg cct gat gtc aag6476 Val Pro Thr Ser Ser Asn Pro Asp Val Ser Val Pro Asp Val Lys 17401745 1750 aaa gat gtt cca aca tct tcc gct cca aat gct gaa gca ctc cct6521 Lys Asp Val Pro Thr Ser Ser Ala Pro Asn Ala Glu Ala Leu Pro 17551760 1765 tca ccc agt gaa gag gga tca act gag gaa gat gat caa gcc gca6566 Ser Pro Ser Glu Glu Gly Ser Thr Glu Glu Asp Asp Gln Ala Ala 17701775 1780 gag gag act cct gca cca cgg gca cca gaa cct gct cca ggt gat6611 Glu Glu Thr Pro Ala Pro Arg Ala Pro Glu Pro Ala Pro Gly Asp 17851790 1795 ctc att gac tta gaa gaa gtc gaa tct gat gaa gaa ccc att gcc6656 Leu Ile Asp Leu Glu Glu Val Glu Ser Asp Glu Glu Pro Ile Ala 18001805 1810 aac cgg ttg gca cct ggc att gca gaa agg tta caa agc aga aaa6701 Asn Arg Leu Ala Pro Gly Ile Ala Glu Arg Leu Gln Ser Arg Lys 18151820 1825 ggg aag acc ccc att aag agg tct gga cga atc aaa aca atg gcc6746 Gly Lys Thr Pro Ile Lys Arg Ser Gly Arg Ile Lys Thr Met Ala 18301835 1840 cag aag aag agt act cca atc act cct gcc aca tcc aga aga agc6791 Gln Lys Lys Ser Thr Pro Ile Thr Pro Ala Thr Ser Arg Arg Ser 18451850 1855 aag gtt gct atc ccc tcc aag aag agg aaa gaa att tcg tca tcc6836 Lys Val Ala Ile Pro Ser Lys Lys Arg Lys Glu Ile Ser Ser Ser 18601865 1870 gat tct gat aag gat gtc gaa cta gat gtc tcg aca tct aag aag6881 Asp Ser Asp Lys Asp Val Glu Leu Asp Val Ser Thr Ser Lys Lys 18751880 1885 gcc aag act tca ggg aaa aag gtg cct gga aat gtc cct gat gca6926 Ala Lys Thr Ser Gly Lys Lys Val Pro Gly Asn Val Pro Asp Ala 18901895 1900 cca ttg gac aac atc tct ttc cac tcc att ggc aat gtt gaa aag6971 Pro Leu Asp Asn Ile Ser Phe His Ser Ile Gly Asn Val Glu Lys 19051910 1915 tgg aaa tat gtg tat caa cgc aga ctt gcg gtt gag aga gaa ctg7016 Trp Lys Tyr Val Tyr Gln Arg Arg Leu Ala Val Glu Arg Glu Leu 19201925 1930 gga aga gat gcc ttg gat tgc aag gag atc atg gac ctc atc aag7061 Gly Arg Asp Ala Leu Asp Cys Lys Glu Ile Met Asp Leu Ile Lys 19351940 1945 gct gct gga ctg ctg aag act gtc agc aag ttg gga gat tgc tat7106 Ala Ala Gly Leu Leu Lys Thr Val Ser Lys Leu Gly Asp Cys Tyr 19501955 1960 gaa ggc tta gtc agg gaa ttc att gtc aac att ccc tct gac ata7151 Glu Gly Leu Val Arg Glu Phe Ile Val Asn Ile Pro Ser Asp Ile 19651970 1975 tca aac aga aaa agt gat gat tat caa aga gtg ttt gtc aga gga7196 Ser Asn Arg Lys Ser Asp Asp Tyr Gln Arg Val Phe Val Arg Gly 19801985 1990 aag tgt gtt aga ttc tcc cct gct gtg att aac aaa tat ctg ggc7241 Lys Cys Val Arg Phe Ser Pro Ala Val Ile Asn Lys Tyr Leu Gly 19952000 2005 aga cct act gat gga gtg ata gat att gat gtt tct gag cat caa7286 Arg Pro Thr Asp Gly Val Ile Asp Ile Asp Val Ser Glu His Gln 20102015 2020 att gcc aag gaa atc act gcc aaa cga gtc cag cat tgg cca aag7331 Ile Ala Lys Glu Ile Thr Ala Lys Arg Val Gln His Trp Pro Lys 20252030 2035 aaa ggg aag ctt tca gca gga aag cta agt gtg aag tat gca att7376 Lys Gly Lys Leu Ser Ala Gly Lys Leu Ser Val Lys Tyr Ala Ile 20402045 2050 ctg cac agg att gga gct gca aac tgg gtt ccc acc aat cat act7421 Leu His Arg Ile Gly Ala Ala Asn Trp Val Pro Thr Asn His Thr 20552060 2065 tcc act gtt gcc aca ggt ttg ggt aaa ttt ctg tat gct gtt gga7466 Ser Thr Val Ala Thr Gly Leu Gly Lys Phe Leu Tyr Ala Val Gly 20702075 2080 acc aaa tcc aaa ttt aat ttt gga aac tat atc ttt gat caa act7511 Thr Lys Ser Lys Phe Asn Phe Gly Asn Tyr Ile Phe Asp Gln Thr 20852090 2095 gtt aag cat tca gaa tct ttt gct atc aaa tta ccc att gcc ttc7556 Val Lys His Ser Glu Ser Phe Ala Ile Lys Leu Pro Ile Ala Phe 21002105 2110 cct act gta ttg tgt ggc att atg ttg agt cag cat ccc aat atg7601 Pro Thr Val Leu Cys Gly Ile Met Leu Ser Gln His Pro Asn Met 21152120 2125 tta aac tac act gac tct gtg atg aag aga gaa tct cct cta tcc7646 Leu Asn Tyr Thr Asp Ser Val Met Lys Arg Glu Ser Pro Leu Ser 21302135 2140 ctg cat tac aaa ctg ttt gaa ggg aca cat gtc cca gac att gtc7691 Leu His Tyr Lys Leu Phe Glu Gly Thr His Val Pro Asp Ile Val 21452150 2155 tcg aca tct gtc tcg aca tca ggg aaa gct gct gct tca ggt gct7736 Ser Thr Ser Val Ser Thr Ser Gly Lys Ala Ala Ala Ser Gly Ala 21602165 2170 gtg tcc aag gat gct ctg att gct gaa ctc aag gac aca tgc aag7781 Val Ser Lys Asp Ala Leu Ile Ala Glu Leu Lys Asp Thr Cys Lys 21752180 2185 gtg ctg gaa gca acc atc aaa gcc acc aca gag aag aag atg gag7826 Val Leu Glu Ala Thr Ile Lys Ala Thr Thr Glu Lys Lys Met Glu 21902195 2200 cta gaa ctg ctg atc aaa agg ctc tca gag agt ggc att gat gat7871 Leu Glu Leu Leu Ile Lys Arg Leu Ser Glu Ser Gly Ile Asp Asp 22052210 2215 gaa gaa gca gct gag gaa gaa gga gaa gca gct gaa gaa gaa gaa7916 Glu Glu Ala Ala Glu Glu Glu Gly Glu Ala Ala Glu Glu Glu Glu 22202225 2230 gaa gct gct gag gaa gag gaa gat gca gca gaa gag aca gaa tcc7961 Glu Ala Ala Glu Glu Glu Glu Asp Ala Ala Glu Glu Thr Glu Ser 22352240 2245 gat gat gat tct gaa gcc acc cca tgatcatcag acctttaatt 8005 AspAsp Asp Ser Glu Ala Thr Pro 2250 2255 ttgtttttac ttttattaga tataggggcatgttcctttg aacaattcat tgttattggt 8065 ctgtactatt tgcacattaa tttcatgcatcctacttttg ccaaatttat gtctaaaaag 8125 ggggagtaat agtattatgc ttgctattatgcatgattct gagtagtagg atactatgta 8185 tgatgtatgg cagtaggaaa cgatgtatgcatgattcatg attttgaggg ggagactgct 8245 gctgctgatg atgactgatg attgatgtaagctactagaa gatgctgcag taggagcatg 8305 aagacagggg gagcagatag cggatgtcacatgagatgtc tcgacatcct gcgaaaagac 8365 tagtagctga tagaagatga agcagtaagcatggagacag ggggagcaga agcagaaagc 8425 tgatgtcacg cgagatgtct tgacatcctggagaagactt gtagattagc aacttgaaga 8485 atttccgctg tgcttgatta ctctgaaaatggaagttgct gattccacat gcataactgc 8545 tcgtacctgc tcaggaagtg tctaagtatgttttagacaa aatttgccaa agggggagat 8605 tgttagtgct tagcactact gagtttaaaaaggttggcta agattttgtt aaaacataag 8665 cacttagaca atgaaggaaa gctggagttgctgcacatga tgtccaacgt tatgtcaagg 8725 aataagatcg ggctgcataa tgcacaaggcaagataaagt gtcaagtgat gaattgaagt 8785 tgaaggatcc acgatgtcgg atacaatgtcctgacatcct gctcgagaat actggaagtg 8845 ctgtacaatg caagataaaa gtcaagtgaagcattgaagc tgcaggatcc aagatgtcgg 8905 atacgatgtc ctgacatctg gcccgataatactggacata taaatctgtt atatctttaa 8965 cagattattg tgcagttagc aagagattagaagatctatc tttaggaacg aattaaaaga 9025 tcattaaagt tcgaatttca aagtagaagagttcgttcag ggattaaaga ttaaagatta 9085 aagattcaaa ctaaaagatc aaaagttatcttttagttct ttaactgcag atttttcaga 9145 agaagataga tctcctccag catcaagaacttgcagccca gaatcgtaca cggctatata 9205 atcatggagg ctgcacgagt tctgtaccgagtccgggata aagagttatt ttgtgagttt 9265 tgggacttga gggtttttgt gagccaccttgatggtatac taacatcaag tgttggacct 9325 gattgtgtaa gttgatctct attgggtagggttgatccct ttgtacagag ttgatccgag 9385 tcgacgccct ataa 9399 94 1576 PRTGlycine max misc_feature Soybean retroelement SIRE1 9 94 Met Asn Met GluLys Glu Gly Gly Pro Val Asn Arg Pro Pro Ile Leu 1 5 10 15 Asp Gly SerAsn Tyr Glu Tyr Trp Lys Ala Arg Met Val Ala Phe Leu 20 25 30 Lys Ser LeuAsp Ser Arg Thr Trp Lys Ala Val Ile Lys Gly Trp Glu 35 40 45 His Pro LysMet Leu Asp Thr Glu Gly Lys Pro Thr Asp Glu Leu Lys 50 55 60 Pro Glu GluAsp Trp Thr Lys Glu Glu Asp Glu Leu Ala Leu Gly Asn 65 70 75 80 Ser LysAla Leu Asn Ala Leu Phe Asn Gly Val Asp Lys Asn Ile Phe 85 90 95 Arg LeuIle Asn Thr Cys Thr Val Ala Lys Asp Ala Trp Glu Ile Leu 100 105 110 LysIle Thr His Glu Gly Thr Ser Lys Val Lys Met Ser Arg Leu Gln 115 120 125Leu Leu Ala Thr Lys Phe Glu Asn Leu Lys Met Lys Glu Glu Glu Cys 130 135140 Ile His Asp Phe His Met Asn Ile Leu Glu Ile Ala Asn Ala Cys Thr 145150 155 160 Ala Leu Gly Glu Arg Ile Thr Asp Glu Lys Leu Val Arg Lys IleLeu 165 170 175 Arg Ser Leu Pro Lys Arg Phe Asp Met Lys Val Thr Ala IleGlu Glu 180 185 190 Ala Gln Asp Ile Cys Asn Met Arg Val Asp Glu Leu IleGly Ser Leu 195 200 205 Gln Thr Phe Glu Leu Gly Leu Ser Asp Arg Ala GluLys Lys Ser Lys 210 215 220 Asn Leu Ala Phe Val Ser Asn Asp Glu Gly GluGlu Asp Glu Tyr Asp 225 230 235 240 Leu Asp Thr Asp Glu Gly Leu Thr AsnAla Val Val Leu Leu Gly Lys 245 250 255 Gln Phe Asn Lys Val Leu Asn ArgMet Asp Lys Arg Gln Lys Pro His 260 265 270 Val Gln Asn Ile Pro Phe AspIle Arg Lys Gly Ser Lys Tyr Gln Lys 275 280 285 Arg Ser Asp Val Lys ProSer His Ser Lys Gly Ile Gln Cys His Gly 290 295 300 Cys Glu Gly Tyr GlyHis Ile Ile Ala Glu Cys Pro Thr His Leu Lys 305 310 315 320 Lys His ArgLys Gly Leu Ser Val Cys Gln Ser Asp Thr Glu Ser Glu 325 330 335 Gln GluSer Asp Ser Asp Arg Asp Val Asn Ala Leu Thr Gly Ile Phe 340 345 350 GluThr Ala Glu Asp Ser Ser Asp Thr Asp Ser Glu Ile Thr Phe Asp 355 360 365Glu Leu Ala Ala Ser Tyr Arg Lys Leu Cys Ile Lys Ser Glu Lys Ile 370 375380 Leu Gln Gln Glu Ala Gln Leu Lys Lys Val Ile Ala Asp Leu Glu Ala 385390 395 400 Glu Lys Glu Ala His Glu Glu Glu Ile Ser Glu Leu Lys Gly GluVal 405 410 415 Gly Phe Leu Asn Ser Lys Leu Glu Thr Met Lys Lys Ser IleLys Met 420 425 430 Leu Asn Lys Gly Ser Asp Thr Leu Asp Glu Val Leu LeuLeu Gly Lys 435 440 445 Asn Ala Gly Asn Gln Arg Gly Leu Gly Phe Asn ProLys Phe Ala Gly 450 455 460 Arg Thr Thr Met Thr Glu Phe Val Pro Ala LysAsn Arg Thr Gly Thr 465 470 475 480 Thr Met Ser Gln His Leu Ser Arg HisHis Gly Thr Gln Gln Lys Lys 485 490 495 Ser Lys Arg Lys Lys Trp Arg CysHis Tyr Cys Gly Lys Tyr Gly His 500 505 510 Ile Lys Pro Phe Cys Tyr HisLeu His Gly His Pro His His Gly Thr 515 520 525 Gln Ser Ser Asn Ser ArgLys Lys Met Met Trp Val Pro Lys His Lys 530 535 540 Ala Val Ser Leu ValVal His Thr Ser Leu Arg Ala Ser Ala Lys Glu 545 550 555 560 Asp Trp TyrLeu Asp Ser Gly Cys Ser Arg His Met Thr Gly Val Lys 565 570 575 Glu PheLeu Leu Asn Ile Glu Pro Cys Ser Thr Ser Tyr Val Thr Phe 580 585 590 GlyAsp Gly Ser Lys Gly Lys Ile Ile Gly Met Gly Lys Leu Val His 595 600 605Asp Gly Leu Pro Ser Leu Asn Lys Val Leu Leu Val Lys Gly Leu Thr 610 615620 Ala Asn Leu Ile Ser Ile Ser Gln Leu Cys Asp Glu Gly Phe Asn Val 625630 635 640 Asn Phe Thr Lys Ser Glu Cys Leu Val Thr Asn Glu Lys Ser GluVal 645 650 655 Leu Met Lys Gly Ser Arg Ser Lys Asp Asn Cys Tyr Leu TrpThr Pro 660 665 670 Gln Glu Thr Ser Tyr Ser Ser Thr Cys Leu Ser Ser LysGlu Asp Glu 675 680 685 Val Arg Ile Trp His Gln Arg Phe Gly His Leu HisLeu Arg Gly Met 690 695 700 Lys Lys Ile Ile Asp Lys Gly Ala Val Arg GlyIle Pro Asn Leu Lys 705 710 715 720 Ile Glu Glu Gly Arg Ile Cys Gly GluCys Gln Ile Gly Lys Gln Val 725 730 735 Lys Met Ser His Gln Lys Leu ArgHis Gln Thr Thr Ser Arg Val Leu 740 745 750 Glu Leu Leu His Met Asp LeuMet Gly Pro Met Gln Val Glu Ser Leu 755 760 765 Gly Gly Lys Arg Tyr AlaTyr Val Val Val Asp Asp Phe Ser Arg Phe 770 775 780 Thr Trp Val Asn PheIle Arg Glu Lys Ser Glu Thr Phe Glu Val Phe 785 790 795 800 Lys Glu LeuSer Leu Arg Leu Gln Arg Glu Lys Asp Cys Val Ile Lys 805 810 815 Arg IleArg Ser Asp His Gly Arg Glu Phe Glu Asn Ser Arg Phe Thr 820 825 830 GluPhe Cys Thr Ser Glu Gly Ile Thr His Glu Phe Ser Ala Ala Ile 835 840 845Thr Pro Gln Gln Asn Gly Ile Val Glu Arg Lys Asn Arg Thr Leu Gln 850 855860 Glu Ala Ala Arg Val Met Leu His Ala Lys Glu Leu Pro Tyr Asn Leu 865870 875 880 Trp Ala Glu Ala Met Asn Thr Ala Cys Tyr Ile His Asn Arg ValThr 885 890 895 Leu Arg Arg Gly Thr Pro Thr Thr Leu Tyr Glu Ile Trp LysGly Arg 900 905 910 Lys Pro Ser Val Lys His Phe His Ile Phe Gly Ser ProCys Tyr Ile 915 920 925 Leu Ala Asp Arg Glu Gln Arg Arg Lys Met Asp ProLys Ser Asp Ala 930 935 940 Gly Ile Phe Leu Gly Tyr Ser Thr Asn Ser ArgAla Tyr Arg Val Phe 945 950 955 960 Asn Ser Arg Thr Arg Thr Val Met GluSer Ile Asn Val Val Val Asp 965 970 975 Asp Leu Ser Pro Ala Arg Lys LysAsp Val Glu Glu Asp Val Arg Thr 980 985 990 Leu Gly Asp Asn Val Ala AspAla Ala Lys Ser Gly Glu Asn Ala Glu 995 1000 1005 Asn Ser Asp Ser AlaThr Asp Glu Ser Asn Ile Asn Gln Pro Asp 1010 1015 1020 Lys Arg Ser SerThr Arg Ile Gln Lys Met His Pro Lys Glu Leu 1025 1030 1035 Ile Ile GlyAsp Pro Asn Arg Gly Val Thr Thr Arg Ser Arg Glu 1040 1045 1050 Val GluIle Val Ser Asn Ser Cys Phe Val Ser Lys Ile Glu Pro 1055 1060 1065 LysAsn Val Lys Glu Ala Leu Thr Asp Glu Phe Trp Ile Asn Ala 1070 1075 1080Met Gln Glu Glu Leu Glu Gln Phe Lys Arg Asn Glu Val Trp Glu 1085 10901095 Leu Val Pro Arg Pro Glu Gly Thr Asn Val Ile Gly Thr Lys Trp 11001105 1110 Ile Phe Lys Asn Lys Thr Asn Glu Glu Gly Val Ile Thr Arg Asn1115 1120 1125 Lys Ala Arg Leu Val Ala Gln Gly Tyr Thr Gln Ile Glu GlyVal 1130 1135 1140 Asp Phe Asp Glu Thr Phe Ala Pro Val Ala Arg Leu GluSer Ile 1145 1150 1155 Arg Leu Leu Leu Gly Val Ala Cys Ile Leu Lys PheLys Leu Tyr 1160 1165 1170 Gln Met Asp Val Lys Ser Ala Phe Leu Asn GlyTyr Leu Asn Glu 1175 1180 1185 Glu Val Tyr Val Glu Gln Pro Lys Gly PheAla Asp Pro Thr His 1190 1195 1200 Pro Asp His Val Tyr Arg Leu Lys LysAla Leu Tyr Gly Leu Lys 1205 1210 1215 Gln Ala Pro Arg Ala Trp Tyr GluArg Leu Thr Glu Phe Leu Thr 1220 1225 1230 Gln Gln Gly Tyr Arg Lys GlyGly Ile Asp Lys Thr Leu Phe Val 1235 1240 1245 Lys Gln Asp Ala Glu AsnLeu Met Ile Ala Gln Ile Tyr Val Asp 1250 1255 1260 Asp Ile Val Phe GlyGly Met Ser Asn Glu Met Leu Arg His Phe 1265 1270 1275 Val Gln Gln MetGln Ser Glu Phe Glu Met Ser Leu Val Gly Glu 1280 1285 1290 Leu Thr TyrPhe Leu Gly Leu Gln Val Lys Gln Met Glu Asp Ser 1295 1300 1305 Ile PheLeu Ser Gln Ser Arg Tyr Ala Lys Asn Ile Val Lys Lys 1310 1315 1320 PheGly Met Glu Asn Ala Ser His Lys Arg Thr Pro Ala Pro Thr 1325 1330 1335His Leu Lys Leu Ser Lys Asp Glu Ala Gly Thr Ser Val Asp Gln 1340 13451350 Ser Leu Tyr Arg Ser Met Ile Gly Ser Leu Leu Tyr Leu Thr Ala 13551360 1365 Ser Arg Pro Asp Ile Thr Tyr Ala Val Gly Val Cys Ala Arg Tyr1370 1375 1380 Gln Ala Asn Pro Lys Ile Ser His Leu Thr Gln Val Lys ArgIle 1385 1390 1395 Leu Lys Tyr Val Asn Gly Thr Ser Asp Tyr Gly Ile MetTyr Cys 1400 1405 1410 His Cys Ser Asn Pro Met Leu Val Gly Tyr Cys AspAla Asp Trp 1415 1420 1425 Ala Gly Ser Ala Asp Asp Arg Lys Ser Thr SerGly Gly Cys Phe 1430 1435 1440 Tyr Leu Gly Asn Asn Leu Ile Ser Trp PheSer Lys Lys Gln Asn 1445 1450 1455 Cys Val Ser Leu Ser Thr Ala Glu AlaGlu Tyr Ile Ala Ala Gly 1460 1465 1470 Ser Ser Cys Ser Gln Leu Val TrpMet Lys Gln Met Leu Lys Glu 1475 1480 1485 Tyr Asn Val Glu Gln Asp ValMet Thr Leu Tyr Cys Asp Asn Met 1490 1495 1500 Ser Ala Ile Asn Ile SerLys Asn Pro Val Gln His Ser Arg Thr 1505 1510 1515 Lys His Ile Asp IleArg His His Tyr Ile Arg Asp Leu Val Asp 1520 1525 1530 Asp Lys Val IleThr Leu Lys His Val Asp Thr Glu Glu Gln Ile 1535 1540 1545 Ala Asp IlePhe Thr Lys Ala Leu Asp Ala Asn Gln Phe Glu Lys 1550 1555 1560 Leu ArgGly Lys Leu Gly Ile Cys Leu Leu Glu Glu Leu 1565 1570 1575 95 680 PRTGlycine max misc_feature Soybean retroelement SIRE1 9 95 Gln Leu Gln GlnSer Glu Arg Ala Gln Thr Asn His Leu Thr Leu Ile 1 5 10 15 Ala Arg SerPro Gln Ser Lys Phe Asp Arg Cys Leu Thr Arg Pro Ser 20 25 30 Thr Phe PheIle Gln Ile Tyr Ile Cys Leu Ala Phe Val Phe Ser Pro 35 40 45 Ala Phe SerAsn Asn Ser Leu Arg Phe Thr Lys Ser Phe Gln Thr Leu 50 55 60 Cys Phe SerMet Ala Thr Ser Pro Lys Glu Thr Ala Ala Ser Gly Ser 65 70 75 80 Pro SerVal Pro Ser Ser Pro His Gln Glu Gln Pro Glu Phe Asn Ile 85 90 95 Gln ProIle Gln Ile Ile Pro Gly Gln Ala Ser Val Pro Glu Lys Leu 100 105 110 ValPro Arg Arg Pro Gln Gly Val Lys Ile Ala Glu Asn Pro Ser Pro 115 120 125Ala Thr Ser Pro Arg Glu Val Asp Thr Glu Met Asp Lys Lys Ile Arg 130 135140 Ser Ile Val Ser Ser Ile Leu Lys Asp Ala Ser Val Pro Glu Ala Asp 145150 155 160 Glu Asp Val Pro Thr Ser Ser Asn Pro Asp Val Ser Val Pro AspVal 165 170 175 Lys Lys Asp Val Pro Thr Ser Ser Ala Pro Asn Ala Glu AlaLeu Pro 180 185 190 Ser Pro Ser Glu Glu Gly Ser Thr Glu Glu Asp Asp GlnAla Ala Glu 195 200 205 Glu Thr Pro Ala Pro Arg Ala Pro Glu Pro Ala ProGly Asp Leu Ile 210 215 220 Asp Leu Glu Glu Val Glu Ser Asp Glu Glu ProIle Ala Asn Arg Leu 225 230 235 240 Ala Pro Gly Ile Ala Glu Arg Leu GlnSer Arg Lys Gly Lys Thr Pro 245 250 255 Ile Lys Arg Ser Gly Arg Ile LysThr Met Ala Gln Lys Lys Ser Thr 260 265 270 Pro Ile Thr Pro Ala Thr SerArg Arg Ser Lys Val Ala Ile Pro Ser 275 280 285 Lys Lys Arg Lys Glu IleSer Ser Ser Asp Ser Asp Lys Asp Val Glu 290 295 300 Leu Asp Val Ser ThrSer Lys Lys Ala Lys Thr Ser Gly Lys Lys Val 305 310 315 320 Pro Gly AsnVal Pro Asp Ala Pro Leu Asp Asn Ile Ser Phe His Ser 325 330 335 Ile GlyAsn Val Glu Lys Trp Lys Tyr Val Tyr Gln Arg Arg Leu Ala 340 345 350 ValGlu Arg Glu Leu Gly Arg Asp Ala Leu Asp Cys Lys Glu Ile Met 355 360 365Asp Leu Ile Lys Ala Ala Gly Leu Leu Lys Thr Val Ser Lys Leu Gly 370 375380 Asp Cys Tyr Glu Gly Leu Val Arg Glu Phe Ile Val Asn Ile Pro Ser 385390 395 400 Asp Ile Ser Asn Arg Lys Ser Asp Asp Tyr Gln Arg Val Phe ValArg 405 410 415 Gly Lys Cys Val Arg Phe Ser Pro Ala Val Ile Asn Lys TyrLeu Gly 420 425 430 Arg Pro Thr Asp Gly Val Ile Asp Ile Asp Val Ser GluHis Gln Ile 435 440 445 Ala Lys Glu Ile Thr Ala Lys Arg Val Gln His TrpPro Lys Lys Gly 450 455 460 Lys Leu Ser Ala Gly Lys Leu Ser Val Lys TyrAla Ile Leu His Arg 465 470 475 480 Ile Gly Ala Ala Asn Trp Val Pro ThrAsn His Thr Ser Thr Val Ala 485 490 495 Thr Gly Leu Gly Lys Phe Leu TyrAla Val Gly Thr Lys Ser Lys Phe 500 505 510 Asn Phe Gly Asn Tyr Ile PheAsp Gln Thr Val Lys His Ser Glu Ser 515 520 525 Phe Ala Ile Lys Leu ProIle Ala Phe Pro Thr Val Leu Cys Gly Ile 530 535 540 Met Leu Ser Gln HisPro Asn Met Leu Asn Tyr Thr Asp Ser Val Met 545 550 555 560 Lys Arg GluSer Pro Leu Ser Leu His Tyr Lys Leu Phe Glu Gly Thr 565 570 575 His ValPro Asp Ile Val Ser Thr Ser Val Ser Thr Ser Gly Lys Ala 580 585 590 AlaAla Ser Gly Ala Val Ser Lys Asp Ala Leu Ile Ala Glu Leu Lys 595 600 605Asp Thr Cys Lys Val Leu Glu Ala Thr Ile Lys Ala Thr Thr Glu Lys 610 615620 Lys Met Glu Leu Glu Leu Leu Ile Lys Arg Leu Ser Glu Ser Gly Ile 625630 635 640 Asp Asp Glu Glu Ala Ala Glu Glu Glu Gly Glu Ala Ala Glu GluGlu 645 650 655 Glu Glu Ala Ala Glu Glu Glu Glu Asp Ala Ala Glu Glu ThrGlu Ser 660 665 670 Asp Asp Asp Ser Glu Ala Thr Pro 675 680 96 21 DNAArtificial sequence Synthetic primer 96 tggaaggttg taaacagtgg c 21 97 19DNA Artificial sequence Synthetic primer 97 agtcgaaagg gatgttccg 19 9819 DNA Artificial sequence Synthetic primer 98 acattgtctc gacacaggg 1999 17 DNA Artificial sequence Synthetic primer 99 atattttcgg gcagatg 17100 7 DNA Artificial sequence Synthetic primer 100 tatataa 7 101 10 DNAArtificial sequence Synthetic primer 101 tggtatcaga 10 102 11 DNAArtificial sequence Synthetic primer 102 aaagggggag a 11 103 4 PRTArtificial sequence Synthetic peptide 103 Cys Cys His Cys 1 104 4 PRTArtificial sequence Synthetic peptide 104 His His Cys Cys 1 105 8 DNAArtificial sequence Synthetic primer 105 agggggag 8

I claim:
 1. An isolated, purified polynucleotide comprising apolynucleotide selected from the group consisting of SEQ ID NO: 87, SEQID NO: 90, SEQ ID NO: 93, and fragments thereof, wherein said fragmentsretain one or more functional properties of their respective parentpolynucleotides.
 2. The polynucleotide of claim 1 wherein said fragmentscomprise all or part of one or more SIRE1 long terminal repeats.
 3. Thepolynucleotide of claim 1 further comprising a heterologous DNA.
 4. Thepolynucleotide of claim 3 wherein said heterologous DNA comprises atranscriptional regulatory element.
 5. A vector comprising thepolynucleotide according to claim
 1. 6. The vector of claim 5 furthercomprising a heterologous DNA.
 7. The vector of claim 6 wherein saidheterologous DNA comprises a transcriptional regulatory element.
 8. Thevector of claim 6 wherein said heterologous DNA is operably linked to atranscriptional regulatory element.
 9. The vector of claim 8 wherein theheterologous DNA comprises a DNA encoding a protein conferringresistance to a plant disease.
 10. The vector of claim 8 wherein saidheterologous DNA comprises a DNA encoding a protein conferringresistance to insect infestation.
 11. The vector of claim 8 wherein saidheterologous DNA comprises a DNA encoding a protein conferring toleranceto a herbicide.
 12. The vector of claim 8 wherein said heterologous DNAcomprises a DNA encoding a protein conferring tolerance enhancednitrogen fixation or nodulation.
 13. The vector of claim 8 wherein saidheterologous DNA comprises a DNA encoding a protein conferring enhancedvigor or growth.
 14. The vector of claim 8 wherein said heterologous DNAcomprises a DNA encoding a SIRE-1-encoded protein.
 15. The vector ofclaim 8 wherein said heterologous DNA comprises a gene or a fragmentthereof.
 16. The vector of claim 8 wherein said heterologous DNAcomprises a DNA encoding an antisense transcript.
 17. A method fortransforming a host cell comprising the step of introducing a vectoraccording to claims 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 into saidhost cell.
 18. A host cell transformed by the method of claim
 17. 19.The host cell according to claim 18 wherein said host cell is a plantcell.
 20. The host cell according to claim 19 wherein said plant cell isa soybean cell.
 21. An isolated, purified protein comprising an aminoacid sequence encoded by a SIRE1 ORF1 selected from the group consistingof SEQ ID NO: 88, SEQ ID NO: 91, SEQ ID NO: 94 and fragments thereof,wherein said protein fragments retain one or more properties of theirrespective parent proteins.
 22. The protein of claim 21 wherein saidprotein is a recombinant protein.
 23. An isolated, purified proteincomprising an amino acid sequence encoded by a SIRE1 ORF2 selected fromthe group consisting of SEQ ID NO: 89, SEQ ID NO: 92, SEQ ID NO: 95 andfragments thereof, wherein said protein fragments retain one or moreproperties of their respective parent proteins.
 24. The protein of claim21 wherein said protein is a recombinant protein.
 25. A method formaking a heterologous protein comprising the steps of: (a) culturing ahost cell according to claim 18 under suitable medium and environmentalconditions; and (b) isolating said protein from said cultured cell orfrom said medium.
 26. An isolated, purified antibody that specificallyrecognizes an epitope on a protein of claim
 21. 27. An isolated,purified antibody that specifically recognizes an epitope on a proteinof claim
 23. 28. A method for transforming a plant cell, said methodcomprising the steps of: (a) introducing a polynucleotide according toclaim 1 into a plant cell; and (b) culturing said plant cell undersuitable nutrient and environmental conditions; and (c) detecting saidpolynucleotide in said plant cell.
 29. A method for transforming a plantcell, said method comprising the steps of: (a) introducing a vectoraccording to any one of claims 5 to 8 into a plant cell; (b) culturingsaid plant cell under suitable nutrient and environmental conditions forthe expression of an expression product of said polynucleotide; and (c)detecting said expression product.
 30. A transformed plant cell productby the method of claim 28 or claim
 29. 31. The transformed plant cell ofclaim 30 wherein said plant cell is a soybean cell.
 32. A transgenicplant comprising a vector according to claims 5, 6, 7, or
 8. 33. Amethod for generating a transgenic plant, the method comprising: (a)introducing a vector according to claim 6 into a plant cell anddetecting the polynucleotide in the plant cell; and (b) generating aplant from the cell of step (a), wherein the plant comprises cells whichcontain the heterologous DNA.
 34. A transgenic plant produced accordingto the method of claim 33 or transgenic progeny thereof that contain theheterologous DNA.